Unit 3,4,5 CC Notes by NK
Unit 3,4,5 CC Notes by NK
Unit 3,4,5 CC Notes by NK
Contents
3.1 Cloud Architecture Design
3.2 NIST Cloud Computing Reference Architecture
3.3 Cloud Deployment Models
3.4 Cloud Service Models
3.5 Architectural Design Challenges
3.6 Cloud Storage
3.7 Storage as a Service
3.8 Advantages of Cloud Storage
3.9 Cloud Storage Providers
3.10 Simple Storage Service (S3)
(3 - 1)
Cloud Computing 3-2 Cloud Architecture, Services and Storage
built into the data centers which are typically owned and operated by a third - party
provider. The next section explains the layered architecture design for cloud platform.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3-4 Cloud Architecture, Services and Storage
virtual machines or virtual servers along with virtual storages. The abstraction of these
hardware resources is intended to provide the flexibility to the users. Internally,
virtualization performs automated resource provisioning and optimizes the process of
managing resources. The infrastructure layer act as a foundation for building the second
layer called platform layer for supporting PaaS services.
The platform layer is responsible for providing readily available development and
deployment platform for web applications to the cloud users without needing them to
install in a local device. The platform layer has collection of software tools for
development, deployment and testing the software applications. This layer provides an
environment for users to create their applications, test operation flows, track the
performance and monitor execution results. The platform must be ensuring to provide
scalability, reliability and security. In this layer, virtualized cloud platform, acts as an
"application middleware" between the cloud infrastructure and application layer of cloud.
The platform layer is the foundation for application layer.
A collection of all software modules required for SaaS applications forms the
application layer. This layer is mainly responsible for making on demand application
delivery. In this layer, software applications include day-to-day office management
softwares used for information collection, document processing, calendar and
authentication. Enterprises also use the application layer extensively in business
marketing, sales, Customer Relationship Management (CRM), financial transactions and
Supply Chain Management (SCM). It is important to remember that not all cloud services
are limited to a single layer. Many applications can require mixed - layers resources. After
all, with a relation of dependency, the three layers are constructed from the bottom up
approach. From the perspective of the user, the services at various levels need specific
amounts of vendor support and resource management for functionality. In general, SaaS
needs the provider to do much more work, PaaS is in the middle and IaaS requests the
least. The best example of application layer is the Salesforce.com's CRM service where not
only the hardware at the bottom layer and the software at the top layer is supplied by the
vendor, but also the platform and software tools for user application development and
monitoring.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3-5 Cloud Architecture, Services and Storage
The NIST team works closely with leading IT vendors, developers of standards,
industries and other governmental agencies and industries at a global level to support
effective cloud computing security standards and their further development. It is
important to note that this NIST cloud reference architecture does not belong to any
specific vendor products, services or some reference implementation, nor does it prevent
further innovation in cloud technology.
The NIST reference architecture is shown in Fig. 3.2.1.
Fig. 3.2.1 : Conceptual cloud reference model showing different actors and entities
From Fig. 3.2.1, note that the cloud reference architecture includes five major actors :
Cloud consumer
Cloud provider
Cloud auditor
Cloud broker
Cloud carrier
Each actor is an organization or entity plays an important role in a transaction or a
process, or performs some important task in cloud computing. The interactions between
these actors are illustrated in Fig. 3.2.2.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3-6 Cloud Architecture, Services and Storage
Now, understand that a cloud consumer can request cloud services directly from a
CSP or from a cloud broker. The cloud auditor independently audits and then contacts
other actors to gather information. We will now discuss the role of each actor in detail.
Example 1 : Cloud consumer requests the service from the broker instead of directly
contacting the CSP. The cloud broker can then create a new service by combining
multiple services or by enhancing an existing service. Here, the actual cloud provider is
not visible to the cloud consumer. The consumer only interacts with the broker. This is
illustrated in Fig. 3.2.3.
Example 2 : In this scenario, the cloud carrier provides for connectivity and transports
cloud services to consumers. This is illustrated in Fig. 3.2.4.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3-7 Cloud Architecture, Services and Storage
In Fig. 3.2.4, the cloud provider participates by arranging two SLAs. One SLA is with
the cloud provider (SLA2) and the second SLA is with the consumer (SLA1). Here, the
cloud provider will have an arrangement (SLA) with the cloud carrier to have secured,
encrypted connections. This ensures that the services are available for the consumer at a
consistent level to fulfil service requests. Here, the provider can specify the requirements,
such as flexibility, capability and functionalities in SLA2 to fulfil essential service
requirements in SLA1.
Example 3 : In this usage scenario, the cloud auditor conducts independent evaluations
for a cloud service. The evaluations will relate to operations and security of cloud service
implementation. Here the cloud auditor interacts with both the cloud provider and
consumer, as shown in Fig. 3.2.5.
In all the given scenarios, the cloud consumer plays the most important role. Based on
the service request, the activities of other players and usage scenarios can differ for other
cloud consumers. Fig. 3.2.6 shows an example of available cloud services types.
In Fig. 3.2.6, note that SaaS applications are available over a network to all consumers.
These consumers may be organisations with access to software applications, end users,
app developers or administrators. Billing is based on the number of end users, the time of
use, network bandwidth consumed and for the amount or volume of data stored.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3-8 Cloud Architecture, Services and Storage
PaaS consumers can utilize tools, execution resources, development IDEs made
available by cloud providers. Using these resources, they can test, develop, manage,
deploy and configure many applications that are hosted on a cloud. PaaS consumers are
billed based on processing, database, storage, network resources consumed and for the
duration of the platform used.
On the other hand, IaaS consumers can access virtual computers, network - attached
storage, network components, processor resources and other computing resources that
are deployed and run arbitrary software. IaaS consumers are billed based on the amount
and duration of hardware resources consumed, number of IP addresses, volume of data
stored, network bandwidth, and CPU hours used for a certain duration.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3-9 Cloud Architecture, Services and Storage
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 10 Cloud Architecture, Services and Storage
Service Arbitrage : This is similar to aggregation, except for the fact that services
that are aggregated are not fixed. In service arbitrage, the broker has the liberty to
choose services from different agencies.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 12 Cloud Architecture, Services and Storage
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 13 Cloud Architecture, Services and Storage
7. It is cheaper than in house cloud implementation because user have to pay for that
they have used.
8. The resources are easily scalable.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 14 Cloud Architecture, Services and Storage
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 15 Cloud Architecture, Services and Storage
Intranet and
7 Network Internet Intranet Internet
Internet
For general
For general Organizations public and For community
8 Availability
public internal staff organizations members
internal Staff
Openstack,
Windows Combination of
VMware cloud, Salesforce
9 Example Azure, AWS Openstack and
CloudStack, community
etc. AWS
Eucalyptus etc.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 16 Cloud Architecture, Services and Storage
From Fig. 3.4.1, we can see that the Infrastructure as a Service (IaaS) is the bottommost
layer in the model and Software as a Service (SaaS) lies at the top. The IaaS has lower
level of abstraction and visibility, while SaaS has highest level of visibility.
The Fig. 3.4.2 represents the cloud stack organization from physical infrastructure to
applications. In this layered architecture, the abstraction levels are seen where higher
layer services include the services of the underlying layer.
As you can see in Fig. 3.4.2, the three services, IaaS, PaaS and SaaS, can exist
independent of one another or may combine with one another at some layers. Different
layers in every cloud computing model are either managed by the user or by the vendor
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 17 Cloud Architecture, Services and Storage
(provider). In case of the traditional IT model, all the layers or levels are managed by the
user because he or she is solely responsible for managing and hosting the applications. In
case of IaaS, the top five layers are managed by the user, while the four lower layers
(virtualisation, server hardware, storage and networking) are managed by vendors or
providers. So, here, the user will be accountable for managing the operating system via
applications and managing databases and security of applications. In case of PaaS, the
user needs to manage only the application and all the other layers of the cloud computing
stack are managed by the vendor. Lastly, SaaS abstracts the user from all the layers as all of
them are managed by the vendor and the former is responsible only for using the
application.
The core middleware manages the physical resources and the VMs are deployed on
top of them. This deployment will provide the features of pay-per-use services and multi-
tenancy. Infrastructure services support cloud development environments and provide
capabilities for application development and implementation. It provides different
libraries, models for programming, APIs, editors and so on to support application
development. When this deployment is ready for the cloud, they can be used by end-
users/ organisations. With this idea, let us further explore the different service models.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 18 Cloud Architecture, Services and Storage
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 19 Cloud Architecture, Services and Storage
In IaaS, the customer has controls over the OS, storage and installed applications, but
has limited control over network components. The user cannot control the underlying
cloud infrastructure. Services offered by IaaS include web servers, server hosting,
computer hardware, OS, virtual instances, load balancing, web servers and bandwidth
provisioning. These services are useful during volatile demands and when there is a
computing resource need for a new business launch or when the company may not want
to buy hardware or if the organisation wants to expand.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 20 Cloud Architecture, Services and Storage
In this model, users interact with the software and append and retrieve data, perform
an action, obtain results from a process task and perform other actions allowed by the
PaaS vendor. In this service model, the customer does not own any responsibility to
maintain the hardware and software and the development environment. The applications
created are the only interactions between the customer and the PaaS platform. The PaaS
cloud provider owns responsibility for all the operational aspects, such as maintenance,
updates, management of resources and product lifecycle. A PaaS customer can control
services such as device integration, session management, content management, sandbox,
and so on. In addition to these services, customer controls are also possible in Universal
Description Discovery and Integration (UDDI), and platform independent Extensible
Mark-up Language (XML) registry that allows registration and identification of web
service apps.
Let us consider an example of Google app engine.
The platform allows developers to program apps using Google’s published APIs. In
this platform, Google defines the tools to be used within the development framework, the
file system structure and data stores. A similar PaaS offering is given by Force.com,
another vendor that is based on the Salesforce.com development platform for the latter’s
SaaS offerings.Force.com provides an add - on development environment.
In PaaS, note that developers can build an app with Python and Google API. Here, the
PaaS vendor is the developer who offers a complete solution to the user. For instance,
Google acts as a PaaS vendor and offers web service apps to users. Other examples are :
Google Earth, Google Maps, Gmail, etc.
PaaS has a few disadvantages. It locks the developer and the PaaS platform in a
solution specific to a platform vendor. For example, an application developed in Python
using Google API on Google App Engine might work only in that environment.
PaaS is also useful in the following situations :
When the application must be portable.
When proprietary programming languages are used.
When there is a need for custom hardware and software.
Major PaaS applications include software development projects where developers and
users collaborate to develop applications and automate testing services.
3.4.2.1 Power of PaaS
PaaS offers promising services and continues to offer a growing list of benefits. The
following are some standard features that come with a PaaS solution :
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 21 Cloud Architecture, Services and Storage
Source code development : PaaS solutions provide the users with a wide range of
language choices including stalwarts such as Java, Perl, PHP, Python and Ruby.
Websites : PaaS solutions provide environments for creating, running and
debugging complete websites, including user interfaces, databases, privacy and
security tools. In addition, foundational tools are also available to help developers
update and deliver new web applications to meet the fast-changing needs and
requirements of their user communities.
Developer sandboxes : PaaS also provides dedicated “sandbox” areas for
developers to check how snippets of a code perform prior to a more formal test.
Sandboxes help the developers to refine their code quickly and provide an area
where other programmers can view a project, offer additional ideas and suggest
changes or fixes to bugs.
The advantages of PaaS go beyond relieving the overheads of managing servers,
operating systems, and development frameworks. PaaS resources can be provisioned and
scaled quickly, within days or even minutes. This is because the organisation does not
have host any infrastructure on premises. In fact, PaaS also may help organisations reduce
costs with its multitenancy model of cloud computing allowing multiple entities to share
the same IT resources. Interestingly, the costs are predictable because the fees are pre-
negotiating every month.
The following boosting features can empower a developer’s productivity, if efficiently
implemented on a PaaS site :
Fast deployment : For organisations whose developers are geographically scattered,
seamless access and fast deployment are important.
Integrated Development Environment (IDE) : PaaS must provide the developers
with Internet - based development environment based on a variety of languages,
such as Java, Python, Perl, Ruby etc., for scripting, testing and debugging their
applications.
Database : Developers must be provided with access to data and databases. PaaS
must provision services such as accessing, modifying and deleting data.
Identity management : Some mechanism for authentication management must be
provided by PaaS. Each user must have a certain set of permissions with the
administrator having the right to grant or revoke permissions.
Integration : Leading PaaS vendors, such as Amazon, Google App Engine, or
Force.com provide integration with external or web-based databased and services.
This is important to ensure compatibility.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 22 Cloud Architecture, Services and Storage
Logs : PaaS must provide APIs to open and close log files, write and examine log
entries and send alerts for certain events. This is a basic requirement of application
developers irrespective of their projects.
Caching : This feature can greatly boost application performance. PaaS must make
available a tool for developers to send a resource to cache and to flush the cache.
3.4.2.2 Complications with PaaS
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 23 Cloud Architecture, Services and Storage
Co-tenants, who share the same resources, may mutually attack each other’s objects.
Third parties may attack a user object. Objects need to securely code themselves to
defend themselves.
Cryptographic methods namely, symmetric and asymmetric encryption, hashing
and signatures are the solution for object vulnerability. It is the responsibility of the
providers to protect the integrity and privacy of user objects on a host.
Vendor lock-in : Pertaining to the lack of standardisation, vendor lock-in becomes a
key barrier that stops users from migrating to cloud services. Technology related solutions
are being built to tackle this problem of vendor lock-in. Most customers are unaware of
the terms and conditions of the providers that prevent interoperability and portability of
applications. A number of strategies are proposed on how to avoid/lessen lock-in risks
before adopting cloud computing.
Lock-in issues arise when a company decides to change cloud providers but is unable
to migrate its applications or data to a different vendor. This heterogeneity of cloud
semantics creates technical incompatibility, which in turn leads to interoperability and
portability challenges. This makes interoperation, collaboration, portability and
manageability of data and services a very complex task.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 24 Cloud Architecture, Services and Storage
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 25 Cloud Architecture, Services and Storage
compatibility is eliminated.
SaaS has the capacity to support multiple users.
In spite of the above benefits, there are some drawbacks of SaaS. For example, SaaS is
not suited for applications that need real - time response where there is a requirement for
data to be hosted externally.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 26 Cloud Architecture, Services and Storage
To protect from cloud attacks, one could encrypt their data before placing it in a cloud.
In many countries, there are laws that allow SaaS providers to keep consumer data and
copyrighted material within national boundaries that also called as compliance or
regulatory standards. Many countries still do not have laws for compliance; therefore, it is
indeed required to check the cloud service providers SLA for executing compliance for
services.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 27 Cloud Architecture, Services and Storage
of Service (DDoS) attacks are another obstacle to availability. Criminals are trying to slash
SaaS providers' profits by making their services out of control. Some utility computing
services give SaaS providers the ability to use quick scale - ups to protect themselves
against DDoS attacks.
In some cases, due the failure of a single company who was providing cloud storages
the lock - in concern arises. As well as because of some vendor - lock in solutions of cloud
services providers, organizations face difficulties in migrating to new cloud service
provider. Therefor to mitigate those challenges related to data lock in and vendor lock in,
software stacks can be used to enhance interoperability between various cloud platforms
as well as standardize APIs to rescue data loss due to a single company failure. It also
supports "surge computing" that has the same technological framework in both public
and private clouds and is used to catch additional tasks that cannot be performed
efficiently in a private cloud's data center.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 28 Cloud Architecture, Services and Storage
Intel and AMD technologies and support legacy load balancing hardware to avoid the
challenges related to interoperability.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 29 Cloud Architecture, Services and Storage
receive are not stored on your local hard disks but are kept on the email providers’ server.
It is important to note that none of the data is stored on your local hard drives.
It is true that all computer owners store data. For these users, finding enough storage
space to hold all the data they have accumulated seems like impossible mission. Earlier,
people stored information in the computer’s hard drive or other local storage devices, but
today, this data is saved in a remote database. The Internet provides the connection
between the computer and the database. Fig. 3.6.1 illustrates how cloud storage works.
People may store their data on large hard drives or other external storage devices like
thumb drives or compact discs. But with cloud, the data is stored in a remote database.
Fig. 3.6.1 consists of a client computer, which has a bulk of data to be stored and the
control node, a third-party service provider, which controls several databases together.
Cloud storage system has storage servers. The subscriber copies their files to the storage
servers over the internet, which will then record the data. If the client needs to retrieve the
data, the client accesses the data server with a web - based interface, and the server either
sends the files back to the client or allows the client to access and manipulate the data
itself.
Cloud storage is a service model in which data is maintained, managed and backed up
remotely and made available to users over a network. Cloud storage provides extremely
efficient storage of objects that scales to exabytes of data. It allows to access data from any
storage class instantly, integrate storage with a single unified API into your applications
and optimize the performance with ease. It is the responsibility of cloud storage providers
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 30 Cloud Architecture, Services and Storage
to keep the data available and accessible and to secure and run the physical environment.
Even though data is stored and accessed remotely, you can maintain data both locally and
on the cloud as a measure of safety and redundancy.
The cloud storage system requires one data server to be connected to the internet. The
copies of files are sent by the client to that data server, which saves the information. The
server sends the files back to the client. Through the web - based interface, the server
allows the client to access and change the files on the server itself, whenever he or she
wants to retrieve it. The connection between the computer and database is provided by
the internet. Cloud storage services, however, use tens or hundreds of data servers. Since
servers need maintenance or repair, it is important to store stored data on several
machines, providing redundancy. Without redundancy, cloud storage services could not
guarantee clients that they would be able to access their information at any given time.
There are two techniques used for storing the data on cloud called cloud sync and cloud
backup which are explained as follows.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 31 Cloud Architecture, Services and Storage
storage to end users, who lacks a budget or a capital budget to pay for it on their own.
End users store their data on rented storage space at remote location on cloud. The
storage as a service providers rent their storage space to the organizations on a cost-per-
gigabyte stored or cost-per-data-transfer basis. The end user doesn't have to pay for the
infrastructure; they only pay for how much they're transferring and saving data on the
servers of the provider.
The storage as a service is a good alternative for small or mid - size businesses that
lacks the capital budget to implement and maintain their own storage infrastructure. The
key providers of storage as a service are Amazon S3, Google Cloud Storage, Rackspace,
Dell EMC, Hewlett Packard Enterprise (HPE), NetApp and IBM etc. It is also being
promoted as a way for all companies to mitigate their risks in disaster recovery, provide
long-term retention of records and enhance both business continuity and availability. The
small - scale enterprises find it very difficult and costly to buy dedicated storage
hardware for data storage and backup. This issue is addressed by storage as a service,
which is a business model that help the small companies in renting storage from large
companies who have wider storage infrastructure. It is also suitable if the technical staff
are not available or have insufficient experience to implement and manage the storage
infrastructure.
Individuals as well as small companies can use storage as a service to save cost and
manage backups. They can save cost in hardware, personnel and physical space. Storage
as a service is also called as hosted storage. Storage Service Provider (SSP) are those
companies which are providing storage as a service. SaaS vendors promotes SaaS as a
suitable way of managing backups in the enterprise. They target the secondary storage
applications. It also helps in mitigating the effect of disaster recovery.
Storage providers are responsible for storing data of their customers using this model.
The storage provider provides the software required for the client to access their stored
data on cloud from anywhere and at any time. Customers use that software to perform
standard storage related activities, including data transfers and backups. Since storage as
a service vendors agree to meet SLAs, businesses can be assured that storage can scale
and perform as required. It can facilitate direct connections to both public and private
cloud storage.
In most instances, organizations use storage as a service that opt public cloud for
storage and backup purpose instead of keeping data on premises. The methods provided
by storage as a service include backup and restore, disasters recovery, block storage, SSD
storage, object storage and transmission of bulk data. The backup and restore refers to
data backup to the cloud which provides protection and recovery when data loss occurs.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 32 Cloud Architecture, Services and Storage
Disaster recovery may refer to protecting and replicating data from Virtual Machines
(VMs) in case of disaster. Block storage allows customers to provision block storage
volumes for lower - latency I/O. SSD storage is another type of storage generally used for
data intensive read/write and I/O operations. Object storage systems are used in in data
analytics, disaster recovery and cloud applications. Cold storage is used for quick creation
and configuration of stored data. Bulk data transfers can use disks and other equipment
for bulk data transmission.
There are many cloud storage providers available on the internet, but some of the
popular storage as a service providers are listed as follows :
Google drive - The google provides Google Drive as a storage service for every
Gmail user who can store up to 15 GB of data free of cost which can scale up to ten
terabytes. It allows to use Google Docs embedded with google account to upload
documents, spreadsheets and presentations to Google’s data servers.
Microsoft one drive - Microsoft provides One drive with 5 GB free storage space
which is scalable to 5 TB for storing users’ files. It is embedded with Microsoft 365
and Outlook mails. It allows to synchronize files between the cloud and a local
folder along with providing a client software for any platform to store and access
files from multiple devices. It allows to backed-up files with ransomware protection
as well as allowing to recover previous saved versions of files or data from the
cloud.
Drop box - Dropbox is a file hosting service, that offers cloud storage, file
synchronization, personal cloud and client software services. It can be installed and
run on any OS platform. It provides free storage space of 2 GB which can scale up to
5 TB.
MediaMax and Strongspace - They offer rented storage space for any kind of
digital data to be stored on cloud servers.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 33 Cloud Architecture, Services and Storage
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 34 Cloud Architecture, Services and Storage
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 35 Cloud Architecture, Services and Storage
server when your internet connection remains working. When your internet
connection faces technical problems or stops functioning, you will face difficulties
in transmitting the data to or recovering from remote server.
Compliance problems : Many cloud service providers are prone to weaker
compliance as many countries restrict cloud service providers to expose their users
data across country’s geographic boundaries and if they do so, they may get
penalized or may leads to closure of IT operations of specific cloud service provider
in that country that may leads to huge data loss. Therefore, one should never
purchase cloud storage from an unknown source or third parties and always decide
to buy from well - established companies. It might not be possible to operate within
the public cloud depending on the degree of regulation within your industry. This
is particularly the case for healthcare, financial services and publicly traded
enterprises that need to be very cautious when considering this option.
Vulnerability to attacks : The vulnerability to external hack attacks is present with
your business information stored in the cloud. The internet is not entirely secure,
and for this reason, sensitive data can still be stealthy.
Data management : Managing cloud data can be a challenge because cloud storage
systems have their own structures. Your business current storage management
system may not always fit well with the system offered by the cloud provider.
Data protection concerns : Cloud protection and privacy : There are issues about
the remote storage of sensitive and essential data. Before adopting cloud
technologies, you should be aware that you are providing a third - party cloud
service provider with confidential business details and that could potentially harm
your firm. That's why it's crucial to choose a trustworthy service provider you trust
to keep your information protected.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 37 Cloud Architecture, Services and Storage
on a remote data center system. Users can then access these files via an internet
connection. The cloud storage provider also sells non - storage services at a fee.
Enterprises purchase computing, software, storage and related IT components as discreet
cloud services with a pay-as-you-go license. Customers may choose to lease infrastructure
as a service; platform as a service; or security, software and storage as a service. The level
and type of services chosen are set out in a service level agreement signed with the
provider. The ability to streamline costs by using the cloud can be particularly beneficial
for small and medium - sized organizations with limited budgets and IT staff. The main
advantages of using a cloud storage provider are cost control, elasticity and self - service.
Users can scale computing resources on demand as needed and then discard those
resources after the task has been completed. This removes any concerns about exceeding
storage limitations with on - site networked storage. Some of popular cloud storage
providers are Amazon Web Services, Google, Microsoft, Nirvanics and so on. The
description about popular cloud storage providers are given as follows :
Amazon S3 : Amazon S3 (Simple Storage Service) offers a simple cloud services
interface that can be used to store and retrieve any amount of data from anywhere
on the cloud at any time. It gives every developer access to the same highly scalable
data storage infrastructure that Amazon uses to operate its own global website
network. The goal of the service is to optimize the benefits of scale and to pass those
benefits on to the developers.
Google Bigtable datastore : Google defines Bigtable as a fast and highly scalable
datastore. The google cloud platform allows Bigtable to scale through thousands of
commodity servers that can store petabytes of data together. Bigtable has been
designed with very high speed, versatility and extremely high scalability in mind.
The size of the Bigtable database can be petabytes, spanning thousands of
distributed servers. Bigtable is now open to developers as part of the Google app
engine, their cloud computing platform.
Microsoft live mesh : Windows live mesh was a free-to-use internet - based file
synchronization application designed by Microsoft to enable files and directories
between two or more computers to be synchronized on Windows or Mac OS
platforms. It has support of mesh objects that consists of data feeds, which can be
represented in Atom, RSS, JSON, or XML. It uses live framework APIs to share any
data item between devices that recognize the data.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 38 Cloud Architecture, Services and Storage
Nirvanix : Nirvanix offers public, hybrid and private cloud storage services with
usage - based pricing. It supports Cloud - based Network Attached Storage
(CloudNAS) to store data in premises. Nirvanix CloudNAS is intended for
businesses that manage archival, backup or unstructured archives that need long -
term, secure storage, or organizations that use automated processes to migrate files
to mapped drives. The CloudNAS has built - in disaster data recovery and
automatic data replication feature for up to three geographically distributed storage
nodes.
S3 system allows buckets to be named (Fig. 3.10.2), but the name must be unique in the
S3 namespace across all consumers of AWS. The bucket can be accessed through the S3
web API (with SOAP or REST), which is similar to a normal disk storage system.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 39 Cloud Architecture, Services and Storage
The performance of S3 is limited for use with non-operational functions such as data
archiving, retrieval and disk backup. The REST API is more preferred to SOAP API
because it is easy to work with large binary objects in REST.
Amazon S3 offers large volumes of reliable storage with high protection and low
bandwidth access. S3 is most ideal for applications that need storage archives. For
example, S3 is used by large storage sites that share photos and images.
The APIs to manage the bucket has the following features :
Create new, modify or delete existing buckets.
Upload or download new objects to a bucket.
Search and identify objects in buckets.
Identify metadata associated with objects and buckets.
Specify where the bucket is stored.
Provide public access to buckets and objects.
The S3 service can be used by many users as a backup component in a 3-2-1 backup
method. This implies that your original data is 1, a copy of your data is 2 and an off-site
copy of data is 3. In this method, S3 is the 3rd level of backup. In addition to this, Amazon
S3 provides the feature of versioning.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 40 Cloud Architecture, Services and Storage
In versioning, every version of the object stored in an S3 bucket is retained, but for this,
the user must enable the versioning feature. Any HTTP or REST operation, namely PUT,
POST, COPY or DELETE will create a new object that is stored along with the older
version. A GET operation retrieves the new version of the object, but the ability to recover
and undo actions are also available. Versioning is a useful method for reserving and
archiving data.
3.10.2 Glacier Vs S3
Both amazon S3 and amazon glacier work almost the same way. However, there are
certain important aspects that can reflect the difference between them. Table 3.10.1 shows
the comparison of amazon glacier and amazon S3 :
Amazon Glacier Amazon S3
It is recognised by archive IDs which are It can use “friendly” key names
system generated
It is extremely low - cost storage Its cost is much higher than Amazon Glacier
You can also use amazon S3 interface for availing the offerings of amazon glacier with
no need of learning a new interface. This can be done by utilising Glacier as S3 storage
class along with object lifecycle policies.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 41 Cloud Architecture, Services and Storage
Summary
The cloud architecture design is the important aspect while designing a cloud.
Every cloud platform is intended to provide four essential design goals like
scalability, reliability, efficiency, and virtualization. To achieve this goal, certain
requirements has to be considered.
The layered architecture of a cloud is composed of three basic layers called
infrastructure, platform, and application. These three levels of architecture are
implemented with virtualization and standardization of cloud-provided
hardware and software resources.
The NIST cloud computing reference architecture is designed with taking help of
IT vendors, developers of standards, industries and other governmental
agencies, and industries at a global level to support effective cloud computing
security standards and their further development.
A cloud deployment models are defined according to where the computing
infrastructure resides and who controls the infrastructure. There are four
deployment models are characterized based on the functionality and
accessibility of cloud services namely Public, Private, Hybrid and community.
The public cloud services are runs over the internet. Therefore, the users who
want cloud services have to have internet connection in their local device,
private cloud services are used by the organizations internally and most of the
times it run over the intranet connection, Hybrid cloud services are composed of
two or more clouds that offers the benefits of multiple deployment models while
community cloud is basically the combination of one or more public, private or
hybrid clouds, which are shared by many organizations for a single cause.
The most widespread services of cloud computing are categorised into three
service classes which are also called Cloud service models namely IaaS, PaaS
and SaaS.
Infrastructure-as-a-Service (IaaS) can be defined as the use of servers, storage,
computing power, network and virtualization to form utility like services for
users, Platform as a Service can be defined as a computing platform that allows
the user to create web applications quickly and easily and without worrying
about buying and maintaining the software and infrastructure while Software-
as-a-Service is specifically designed for on demand applications or software
delivery to the cloud users.
There are six challenges related to cloud architectural design related to data
privacy, security, compliance, performance, interoperability, standardization,
service availability, licensing, data storage and bugs.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 42 Cloud Architecture, Services and Storage
Q.1 Bring out differences between private cloud and public cloud. AU : Dec.-16
Ans. : The differences between private cloud and public cloud are given in Table 3.1.
Openstack, VMware
9 Example Windows Azure, AWS etc. Cloud, CloudStack,
Eucalyptus etc.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 43 Cloud Architecture, Services and Storage
Ans. : The hybrid cloud services are composed of two or more clouds that offers the
benefits of multiple deployment models. It mostly comprises on premise private cloud
and off - premise public cloud to leverage benefits of both and allow users inside and
outside to have access to it. The hybrid cloud provides flexibility such that users can
migrate their applications and services from private cloud to public cloud and vice
versa. It becomes most favored in IT industry because of its eminent features like
mobility, customized security, high throughput, scalability, disaster recovery, easy
backup and replication across clouds, high availability and cost efficient etc. The other
benefits of hybrid cloud are
Easily - accessibility between private cloud and public cloud with plan for disaster
recovery.
We can take a decision about what needs to be shared on public network and what
needs to be kept private.
Get unmatched scalability as per demand.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 44 Cloud Architecture, Services and Storage
Operational cost is lower than IaaS. Operational cost is very minimal than IaaS
and PaaS.
It has lower portability than IaaS. It doesn’t provide portability.
Examples : AWS Elastic Beanstalk, Windows Examples : Google Apps, Dropbox,
Azure, Heroku, Force.com, Google App Salesforce, Cisco WebEx, Concur,
Engine, Apache Stratos, OpenShift GoToMeeting
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 45 Cloud Architecture, Services and Storage
Q.6 What are the basic requirements for cloud architecture design ?
Ans. : The basic requirements for cloud architecture design are given as follows :
The cloud architecture design must provide automated delivery of cloud services
along with automated management.
It must support latest web standards like Web 2.0 or higher and REST or RESTful
APIs.
It must support very large - scale HPC infrastructure with both physical and virtual
machines.
The architecture of cloud must be loosely coupled.
It should provide easy access to cloud services through a self-service web portal.
Cloud management software must be efficient to receive the user request, finds the
correct resources, and then calls the provisioning services which invoke the resources
in the cloud.
It must provide enhanced security for shared access to the resources from data
centers.
It must use cluster architecture for getting the system scalability.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 46 Cloud Architecture, Services and Storage
install in a local device. The platform layer has collection of software tools for
development, deployment and testing the software applications. A collection of all
software modules required for SaaS applications forms the application layer. This layer
is mainly responsible for making on demand application delivery. In this layer,
software applications include day-to-day office management softwares used for
information collection, document processing, calendar and authentication. Enterprises
also use the application layer extensively in business marketing, sales, Customer
Relationship Management (CRM), financial transactions, and Supply Chain
Management (SCM).
Q.8 What are different roles of cloud providers ?
Ans. : Cloud provider is an entity that offers cloud services to interested parties. A
cloud provider manages the infrastructure needed for providing cloud services. The
CSP also runs the software to provide services, and organizes the service delivery to
cloud consumers through networks.
SaaS providers then deploy, configure, maintain and update all operations of the
software application on the cloud infrastructure, in order to ensure that services are
provisioned and to fulfil cloud consumer service requests. SaaS providers assume most
of the responsibilities associated with managing and controlling applications deployed
on the infrastructure. On the other hand, SaaS consumers have no or limited
administrative controls.
The major activities of a cloud provider include :
Service deployment : Service deployment refers to provisioning private, public,
hybrid and community cloud models.
Service orchestration : Service orchestration implies the coordination, management
of cloud infrastructure, and arrangement to offer optimized capabilities of cloud
services. The capabilities must be cost-effective in managing IT resources and must
be determined by strategic business needs.
Cloud services management : This activity involves all service-related functions
needed to manage and operate the services requested or proposed by cloud
consumers.
Security : Security, which is a critical function in cloud computing, spans all layers in
the reference architecture. Security must be enforced end-to-end. It has a wide range
from physical to application security. CSPs must take care of security.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 47 Cloud Architecture, Services and Storage
Privacy : Privacy in cloud must be ensured at different levels, such as user privacy,
data privacy, authorization and authentication, and it must also have adequate
assurance levels. Since clouds allow resources to be shared, privacy challenges are a
big concern for consumers using clouds.
Q.9 What are different complications in PaaS ?
Ans. : The following are some of the complications or issues of using PaaS :
Interoperability : PaaS works best on each provider’s own cloud platform, allowing
customers to make the most value out of the service. But the risk here is that the
customisations or applications developed in one vendor’s cloud environment may
not be compatible with another vendor, and hence not necessarily migrate easily to
it.
Although most of the times customers agree with being hooked up to a single
vendor, this may not be the situation every time. Users may want to keep their
options open. In this situation, developers can opt for open-source solutions. Open-
source PaaS provides elasticity by revealing the underlying code, and the ability to
install the PaaS solution on any infrastructure. The disadvantage of using an open
source version of PaaS is that certain benefits of an integrated platform are lost.
Compatibility : Most businesses have a restricted set of programming languages,
architectural frameworks and databases that they deploy. It is thus important to
make sure that the vendor you choose supports the same technologies. For example,
if you are strongly dedicated to a .NET architecture, then you must select a vendor
with native .NET support. Likewise, database support is critical to performance and
minimising complexity.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 48 Cloud Architecture, Services and Storage
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 50 Cloud Architecture, Services and Storage
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 51 Cloud Architecture, Services and Storage
The size of the Bigtable database can be petabytes, spanning thousands of distributed
servers. Bigtable is now open to developers as part of the Google App Engine, their
cloud computing platform.
Microsoft Live Mesh : Windows Live Mesh was a free-to-use Internet-based file
synchronization application designed by Microsoft to enable files and directories
between two or more computers to be synchronized on Windows or Mac OS
platforms. It has support of mesh objects that consists of data feeds, which can be
represented in Atom, RSS, JSON, or XML. It uses Live Framework APIs to share any
data item between devices that recognize the data.
Nirvanix : Nirvanix offers public, hybrid and private cloud storage services with
usage-based pricing. It supports Cloud-based Network Attached Storage
(CloudNAS) to store data in premises. Nirvanix CloudNAS is intended for
businesses that manage archival, backup, or unstructured archives that need long-
term, secure storage, or organizations that use automated processes to migrate files
to mapped drives. The CloudNAS has built-in disaster data recovery and automatic
data replication feature for up to three geographically distributed storage nodes.
Q.13 What is Amazon S3 ?
Ans. : Amazon S3 is a cloud-based storage system that allows storage of data objects in
the range of 1 byte up to 5 GB in a flat namespace. The storage containers in S3 have
predefined buckets, and buckets serve the function of a directory, though there is no
object hierarchy to a bucket, and the user can save objects to it but not files. Amazon S3
offers a simple web services interface that can be used to store and retrieve any amount
of data from anywhere, at any time on the web. It gives any developer access to the
same scalable, secure, fast, low-cost data storage infrastructure that Amazon uses to
operate its own global website network.
Q.1 With architecture, elaborate the various deployment models and reference
models of cloud computing. AU : Dec.-17
Ans. : Refer section 3.3 for cloud deployment models and section 3.4 for cloud reference
models.
Q.2 Describe service and deployment models of cloud computing environment
with illustration. How do they fit in NIST cloud architecture ? AU : Dec.-17
Ans. : Refer section 3.3 for cloud deployment models and section 3.4 for cloud reference
models and section 3.2 for NIST cloud architecture.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 52 Cloud Architecture, Services and Storage
Q.3 List the cloud deployment models and give a detailed note about them.
AU : Dec.-16
Ans. : Refer section 3.3 for cloud deployment models.
Q.4 Give the importance of cloud computing and elaborate the different types of
services offered by it. AU : Dec.-16
Ans. : Refer section 3.4 for cloud service models.
Q.5 What are pros and cons for public, private and hybrid cloud ? AU : Dec.-18
Ans. : Refer section 3.3 for pros and cons of public, private and hybrid cloud and
section 3.3.5 for their comparison.
Q.6 Describe Infrastructure as a Service (IaaS), Platform-as-a-Service (PaaS) and
Software-as-a-Service (SaaS) with example. AU : Dec.-18
Ans. : Refer section 3.4 for cloud service models for description of Infrastructure as a
Service (IaaS), Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS).
Q.7 Illustrate the cloud delivery models in detail. AU : Dec.-19
Ans. : Refer section 3.4 for cloud delivery models.
Q.13 Explain in detail cloud storage along with its pros and cons.
Ans. : Refer section 3.6 for cloud storage and 3.8 for pros and cons of cloud storage.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
4 Resource Management and
Security in Cloud
Syllabus
Inter Cloud Resource Management - Resource Provisioning and Resource Provisioning
Methods - Global Exchange of Cloud Resources - Security Overview - Cloud Security
Challenges - Software-as-a-Service Security - Security Governance - Virtual Machine Security -
IAM - Security Standards.
Contents
4.1 Inter Cloud Resource Management
4.2 Resource Provisioning and Resource Provisioning Methods
4.3 Global Exchange of Cloud Resources
4.4 Security Overview
4.5 Cloud Security Challenges
4.6 Software-as-a-Service Security
4.7 Security Governance
4.8 Virtual Machine Security
4.9 IAM
4.10 Security Standards
(4 - 1)
Cloud Computing 4-2 Resource Management and Security in Cloud
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4-3 Resource Management and Security in Cloud
The consequence is that one cannot directly launch SaaS applications on a cloud
platform. The cloud platform for SaaS cannot be built unless there are compute,
storage and network infrastructure are established.
In above architecture, the lower three layers are more closely connected to physical
specifications.
The Hardware as a Service (HaaS) is the lowermost layer which provides various
hardware resources to run cloud services.
The next layer is Infrastructure as a Service that interconnects all hardware elements
using computer, storage and network services.
The next layer has two services namely Network as a Service (NaaS) to bind and
provisioned cloud services over the network and Location as a Service (LaaS) to
provide collocation service to control, and protect all physical hardware and
network resources.
The next layer is Platform as a Service for web application deployment and delivery
while topmost layer is actually used for on demand application delivery.
In any cloud platform, the cloud infrastructure performance is the primary concern for
every cloud service provider while quality of services, service delivery and security are
the concerns for cloud users. Every SaaS application is subdivided into the different
application areas for business applications like CRM is used for sales, promotion, and
marketing services. CRM offered the first SaaS on the cloud successfully. The other tools
may provide distributed collaboration, financial management or human resources
management.
In inter cloud resource provisioning, developers have to consider how to design the
system to meet critical requirements such as high throughput, HA, and fault tolerance.
The infrastructure for operating cloud computing services may be either a physical server
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4-4 Resource Management and Security in Cloud
or a virtual server. By using VMs, the platform can be flexible, i.e. running services are
not associated with specific hardware platforms. This adds flexibility to cloud computing
platforms. The software layer at the top of the platform is a layer for storing huge
amounts of data.
Like in the cluster environment, there are some runtime support services accessible in
the cloud computing environment. Cluster monitoring is used to obtain the running state
of the cluster as a whole. The scheduler queues the tasks submitted to the entire cluster
and assigns tasks to the processing nodes according to the availability of the node. The
runtime support system helps to keep the cloud cluster working with high efficiency.
Runtime support is the software needed for browser-initiated applications used by
thousands of cloud customers. The SaaS model offers software solutions as a service,
rather than requiring users to buy software. As a result, there is no initial investment in
servers or software licenses on the customer side. On the provider side, the cost is rather
low compared to the conventional hosting of user applications. Customer data is stored in
a cloud that is either private or publicly hosted by PaaS and IaaS providers.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4-5 Resource Management and Security in Cloud
users use virtual machines as a physical host with customized operating systems for
different applications.
For example, Amazon’s EC2 uses Xen as the Virtual Machine Monitor (VMM) which is
also used in IBM’s Blue Cloud. Some VM templates are also supplied on the EC2
platform. From templates, users can select different types of VMs. But no VM templates
are provided by IBM 's Blue Cloud. Any form of VMs may generally be run on the top of
Xen. In its Azure cloud platform, Microsoft also applied virtualization. A resource-
economic services provider should deliver. The increase in energy waste by heat
dissipation from data centers means that power-efficient caching, query processing and
heat management schemes are necessary. Public or private clouds promise to streamline
software, hardware and data as a service, provisioned in order to save on-demand IT
deployment and achieving economies of scale in IT operations.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4-6 Resource Management and Security in Cloud
In storage, numerous technologies are available like SCSI, SATA, SSDs, and Flash
storages and so on. In future, hard disk drives with solid-state drives may be used as an
enhancement in storage technologies. It would ensure reliable and high-performance data
storage. The key obstacles to the adoption of flash memory in data centers have been
price, capacity and, to some extent, lack of specialized query processing techniques.
However, this is about to change as the I/O bandwidth of the solid-state drives is
becoming too impressive to overlook.
Databases are very popular for many applications as they used as an underlying storage
container. The size of such a database can be very high for the processing of huge
quantities of data. The main aim is to store data in structured or semi-structured forms so
that application developers can use it easily and construct their applications quickly.
Traditional databases may meet the performance bottleneck while the system is being
extended to a larger scale. However, some real applications do not need such a strong
consistency. The size of these databases can be very growing. Typical cloud databases
include Google’s Big Table, Amazons Simple DB or DynamoDB and Azure SQL service
from Microsoft Azure.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4-7 Resource Management and Security in Cloud
technologies use the tools of the distributed grid. The Intergrid assigns and manages a
Distributed Virtual Environment (DVE). It is a cluster of available vms isolated from
other virtual clusters. The DVE Manager component performs resource allocation and
management on behalf of particular user applications. The central component of the IGG
is the schedule for enforcing provisioning policies and peering with several other
gateways. The communication system provides an asynchronous message-passing
mechanism that is managed in parallel by a thread pool.
In demand driven resource provisioning, the resources are allocated as per demand by
the users in dynamic environment. This method adds or eliminates computing instances
depending on the current level of usage of allocated resources. The demand-driven
method automatically allocates two the CPUs to the user application when the user uses
one CPU more than 60 percent of the time for an extended period. In general, when a
resource has met the threshold for a certain amount of time, the system increases the
resource on the basis of demand. If the resource is utilized below the threshold for a
certain amount of time, that resource could be reduced accordingly. This method is
implemented by Amazon web services called as auto-scale feature that runs on its EC2
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4-8 Resource Management and Security in Cloud
server. This method is very easy to implement. This approach does not work successfully
if the workload changes abruptly.
In event driven resource provisioning, the resources are allocated whenever an event
generated by the users for at a specific time of interval in dynamic environment. This
method adds or removes machine instances that are based on a specific time event. This
approach works better for seasonal or predicted events when additional resources are
required for shorter time of interval. During these events, the number of users increases
before and decreases after the event period. Decreases over the course of the incident.
This scheme estimates peak traffic before the event happens. This method results in a
small loss of QoS if the occurrence is correctly predicted. Otherwise, its wasted resources
are even larger due to events that do not follow a fixed pattern.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4-9 Resource Management and Security in Cloud
cloud data centers to meet cloud customers' QoS targets. Moreover, not a single provider
of cloud infrastructure will be able to set up its data centers, anywhere around the world.
This will make it difficult to meet the QoS standards for all its customers by cloud
applications service (SaaS) providers. They also want to take advantage of the resources
of multiple providers that can best serve their unique needs in cloud infrastructure. In
companies with global businesses and applications such as Internet services, media
hosting and Web 2.0 applications, this form of requirement often arises. This includes the
federation of providers of cloud infrastructure to offer services to multiple cloud
providers. To accomplish it, Intercloud architecture has been proposed to enable
brokerage and the sharing of cloud resources for applications across multiple clouds in
order to scale applications. The generalized Intercloud architecture is shown in Fig. 4.3.1.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 10 Resource Management and Security in Cloud
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 11 Resource Management and Security in Cloud
and can demonstrate their auditors' compliance. Lack of trust between service providers
and cloud users has prevented cloud computing from being generally accepted as a
solution for on demand service.
Trust and privacy are also more challenging for web and cloud services as most
desktop and server users have resisted leaving user applications to their cloud provider’s
data center. Some users worry about the lack of privacy, security and copyright
protection on cloud platforms. Trust is not a mere technological question, but a social
problem. However, with a technical approach, the social problem can be solved. The
cloud uses virtual environment that poses new security threats that are harder to manage
than traditional configurations of clients and servers. Therefore, a new data protection
model is needed to solve these problems.
Three basic enforcement of cloud security is expected. First, data center security
facilities require year-round on-site security. There is frequently deployment of biometric
readers, CCTV (Close Circuit), motion detection and man traps are required. The global
firewalls, Intrusion Detection Systems (IDSes) and third-party vulnerability assessments
often required for meeting fault-tolerant network security. Finally, security platform must
acquire SSL transmissions, data encryption, strict password policies and the certification
of the system's trust. As Cloud servers can be either physical or Virtual machines.
Security compliance requires a security-aware cloud architecture that should provide
remedy for malware-based attacks such as worms, viruses and DDoS exploit system
vulnerabilities. These attacks compromise the functionality of the system or provide
unauthorized access to critical information for intruders.
a need to protect infrastructure first. The infrastructure security is the important factor in
cloud security. The cloud composed of network of connected servers called host with
applications deployed on them.
The infrastructure security has three levels security model which is composed of Network
level security, Host level security and Application level security. The three models of
infrastructure security are explained as follows.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 13 Resource Management and Security in Cloud
cloud provider.so there we need to ensure ensuring data confidentiality and integrity
together.
For example, as per Amazon Web Services (AWS) security vulnerability report the
users have used digital signature algorithms to access Amazon SimpleDB and Amazon
Elastic Compute Cloud (EC2) over HTTP instead of HTTPS. Because of that they did face
an increased risk that their data could have been altered in transit without their
knowledge.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 14 Resource Management and Security in Cloud
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 16 Resource Management and Security in Cloud
The common types attacks happened at network, host and application levels are
explained in Table 4.1.1.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 17 Resource Management and Security in Cloud
Table 4.4.1 Common types of attacks happened at network, host and application levels
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 18 Resource Management and Security in Cloud
other companies in a cloud environment. In such environment, you may not have
awareness or control of where the resources are running in a shared pool outside the
organization’s boundary. Sharing your data in such environment with other companies
may give the government reasonable reason to seize your assets as other company has
violated the law of compliance. You may put your data at risk of seizure because you
have shared the cloud environment. Most of the times, if you want to switch from one
cloud provider to the other, storage services offered by one cloud vendor may be
incompatible with other platform services. Like Amazon’s “Simple Storage Service” [S3]
is incompatible with IBM’s Blue Cloud, or Dell or Google cloud platform. In Storage
cloud, most clients probably want their data to be encrypted via SSL (Secure Sockets
Layer) across the Internet in both ways. They most probably want to encrypt their data
while it is in the cloud storage pool. Therefore, in cloud, who is controlling the encryption
/ decryption keys when information is encrypted during the cloud ? Is it by the client or
the vendor in the cloud? These are unanswered questions. Therefore, before moving
data to the cloud make sure that the encryption / decryption keys are working and
tested, as when data resides on your own servers.
The integrity of data means making certain that data is maintained identically during
each operation (e.g. transmission, storage or recovery). In other words, data integrity
ensures consistence and correctness of the data. Ensuring the integrity of the information
does mean that it only changes when the transactions are authorized. It sounds good, but
you must remember that there is still no common standard for ensuring data integrity.
The use of SaaS services in the cloud means that software development is much less
necessary. If you plan on using internally developed cloud coding, a formally secure
development software life cycle (SDLC) is even more important. Inadequate use of
mashup technology (web services combinations), which is crucial for cloud applications,
will certainly lead to unknown security vulnerabilities in such applications. A security
model should be integrated into the development tool to guide developers during the
development phase and restrict users to their authorized data only once the system has
been deployed. With increasing number of mission-critical processes moving into the
cloud, SaaS providers will need to provide log information directly and in real time,
probably for their administrators and their customers alike. Someone must take
responsibility of monitoring for security and compliance control. They would not be able
to comply without application and data being tracked by the end users.
As the Payment Card Industry Data Protection Standard (PCI DSS) includes access to
logs, the auditors and regulators may refer to them for auditing a security report. The
security managers must ensure that they obtain access to the logs of the service provider
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 19 Resource Management and Security in Cloud
in the context of any service agreements. Cloud apps are constantly being enhanced by
features and users must remain up-to - date about app improvements to make sure they
are protected. SDLC and security are affected by the speed at which cloud changes in
applications. For example, the SDLC of Microsoft assumes a three to five-year period for
which the mission-critical software won't change substantially, but the cloud may require
a change in the application every couple of weeks. Unfortunately, a secure SLDC cannot
deliver a security cycle that keeps pace with such rapid changes. This means that users
have to update continuously as an older version does not work, or protect the data.
The appropriate fail-over technology is an often-overlooked aspect of securing the
cloud. The company cannot survive if a mission critical application goes offline but may
be survive for non-mission critical applications. Security must shift to device level, so that
businesses can ensure that their data is secured everywhere they go. In cloud computing,
security at the data level is one of the major challenges.
In a cloud world, the majority of compliance requirements do not allow for
enforcement. There are a wide range of IT security and compliance standards that
regulates most business interactions that must be converted to the cloud over time. SaaS
makes it much more difficult for a customer to determine where his data resides in a
network managed by its SaaS provider or a partner of that provider, posing all kinds of
data protection, aggregation and security enforcement concerns. Many regulations of
compliance require that data not be mixed with other data on shared servers or on
databases. Some nation’s government has strict restrictions on what and how long their
citizens can store data. Some regulations on banking require that the financial data from
customers must stay in their countries. Many mobile IT users can have access to business
data & infrastructure without going through the corporate network through cloud-based
applications. It will increase the need for businesses to monitor security between mobile
and cloud-based users. Placing large amounts of confidential information in a global
cloud enables companies to face wide-ranging distributive threats - attacker no longer
have to come and steal data as all this can be found in one "virtual" location. Cloud
virtualization efficiencies require that multi-organisations virtual machines situated
together on the same physical resources. Although the security of traditional data center
remains in place in the cloud environment, the physical separation and hardware security
of virtual machines on the same server cannot protect them from attack. Management
access is via the Internet instead of direct or on-site, monitored and restricted connections
that are in line with the conventional data center model. It raises risk and visibility and
demands strict monitoring for system control modifications and restrictions on access
control.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 20 Resource Management and Security in Cloud
The complex and flexible design of the virtual machines would make it hard for the
security to be maintained and auditable. It would be difficult to demonstrate the security
status of a device and detect the location of an unsafe virtual machine. No matter where a
virtual machine is located in virtual environment, intrusion detection and prevention
systems will require malicious activity to be detected on the virtual machine. The
interconnection of several virtual machines increases the attack surface and risk of
compromise between machines. Individual virtual machines and physical servers in the
cloud server environment uses same operating systems along with business and web
applications that makes raising the threat of an attack or malware exploiting
vulnerabilities remotely. Due to the switching between private cloud and public cloud,
virtual machines become vulnerable. A cloud system that is completely or partially
shared would have a greater attacking surface and thus be more at risk than a resource
environment. Operating systems and application files in a virtualized cloud environment
are on a shared physical infrastructure that requires system, file and activity control to
provide confidence and auditable proof to corporate clients with that their resources have
not become compromised or manipulated.
In the cloud computing environment, the organization uses cloud computing
resources where the subscriber is responsible for patching not the cloud computing
provider. Therefore, it is essential to have patch maintenance awareness. Companies are
frequently required to prove that their conformity with security regulations, standards
and auditing practices is consistent, irrespective of the location of the systems on which
data resides. The data is flexible in the cloud environment and can be placed in on-
premises physical servers, on-site virtual machines or outside the premises on virtual
cloud computing services, and auditors and practicing managers may have to reconsider
it. Many companies are likely to rush into cloud computing without serious consideration
of security implications in their efforts to profit from the benefits of cloud computing,
including significant cost savings.
The virtual machines need to protect themselves, essentially moving the perimeter to
the virtual machine itself, in order to create areas of cloud trust. Enterprise perimeter
security is provided through firewalls, segmentation of network, IDS/IPS, monitoring
tools, De-Militarized Zones (DMZs) and security policies associated with them. These
security strategies and policies control the data resides or transits behind the perimeter.
The cloud service provider is responsible for the security and privacy of customer’s data
in the cloud computing environment.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 21 Resource Management and Security in Cloud
a) Compliance issue
The compliance is related to regulatory standards provided for the use of personal
information or data privacy by the country’s laws or legislations. The compliance makes a
restriction on use or share of personally identifiable information by cloud service
providers. The various regulatory standards are available in USA for data privacy like
USA patriot act, HIPAA, GLBA, FISMA etc.
The compliance concern is depends on various factors like applicable laws,
regulations, standards, contractual commitment, privacy requirements etc. for example as
cloud has multitenant environment, the users data is stored across multiple countries,
regions or states where each region, country has their own legislations related to use and
sharing of personal data that makes restriction on usage of such data.
b) Storage issue
In cloud, storage is the biggest issue because as cloud has multitenant environment it
makes multiple copies of user’s data and store them in multiple data centers across
multiple countries. Therefore, user never comes to know where their personal data is
stored and in which country. The storage concern is related to where users’ data is stored.
So, the main concern for user or organization is to find where their data is stored ? Was
it transferred to another datacenter in other country ? What are the privacy standards
enforced by those countries that makes a limitation on transferring personal data ?
c) Retention issue
The retention issue is related to duration for which the personal data is kept in storage
with retention policies. Each Cloud Service Provider (CSP) has their own set of retention
policies that governs the data.so user or organization has to look at retention policy used
by CSP with their exceptions.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 22 Resource Management and Security in Cloud
d) Access issue
The access issue is related to organizations ability to provide individual with access to
personal information and to comply stated request. The user or organization have right to
know what personal data is kept in cloud and can make request to CSP to stop processing
it or delete it from the cloud.
f) Destruction of data
At the end of retention period CSPs used to destroy PII. So the concern here is
organizations never comes to know whether their data or PII on cloud is destroyed by
CSP or not or they have kept additional copies or they just make it inaccessible to
organization.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 23 Resource Management and Security in Cloud
The survey firm Gartner lists seven security problems, which should be discussed with
a cloud computing provider.
Data location : Is it possible for the provider to check data location ?
Data segregation : Ensure that encryption is effective at all times and such this
encryption schemes are designed and tested by qualified experts.
Recovery : In the event of a disaster, find out what will happen with data. Are
service providers are offering full restoration ? If so, how much time does it take ?
Privileged user access : Find out who has sophisticated access to data and how such
administers are hired and managed.
Regulatory compliance : Ensure the vendor is ready to be audited externally
and/or certified for security.
Long-term viability : What if the company goes out of business, and what will
happen with the data ? How and in what format will the data be restored ?
Investigative support : Is the vendor able to investigate any inappropriate or illegal
activity ?
It is now more difficult to assess data protection, meaning that data security roles are
more critical than in past years. The data must be encrypted yourself as a tactic besides
the Gartner's report. If you encrypt the data using a trustworthy algorithm, then the data
will only be available with the decryption key, regardless of the security and encryption
policies of the service provider. This leads to a further problem, of course: How can you
manage private keys in a computing infrastructure with pay on demand ? SaaS suppliers
will have to incorporate and enhance security practices that managed service providers
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 24 Resource Management and Security in Cloud
provide and develop new practices as the cloud environment evolves in order to deal
with the above security issues along with those mentioned earlier. A structured
agreement for security organizations and the initiative is one of the most critical activities
for a security team. This will foster a shared view of what the security leadership is and
aims to achieve which will encourage 'ownership' for the group's success.
1. Risk Assessment
The risk assessment of security is crucial for helping the information security
organization to make informed decisions to balance business utility dueling security goals
and asset protection. Failure to carry out formal risk evaluations can contribute to
increase in information. Security Audit observations can compromise certification goals,
leading to ineffective, inefficient collection of security checks that cannot mitigate security
risks adequately to an appropriate level of information. The structured risk management
process for information security will proactively identify, prepare and manage risks for
security issues on a daily or on a required basis. Applications and infrastructure will also
provide further and more comprehensive technical risk assessments in the context of
threat-modeling. This can assist product management and engineering groups to be more
proactive with the design, test and collaboration with the internal security team. The
modeling of threats requires both IT and business processes and technical knowledge
about the workings of the applications or systems under review.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 25 Resource Management and Security in Cloud
2. Risk Management
The identification of technological assets; identification of data with its connections to
processes, applications, and storage of data; and assignment of ownership with custodial
responsibilities are part of effective risk management. The risk management measures
will also involve maintaining an information asset repository. Owners have the
responsibility and privileges to ensure the confidentiality, integrity, availability and
privacy of information assets, including protective requirements. A formal risk
assessment process for allocating security resources related to business continuity must
be developed.
4. Security Awareness
Security awareness and culture are among the few successful methods for handling
human risks in security. The failure to provide people with adequate knowledge and
training may expose the organization to a number of security risks which threats and
entry points for persons instead of systems or application vulnerabilities. The risk caused
by the lack of an effective security awareness program can leads to Social Engineering
attacks, lower reputation, slumping responses to potential security incidents, and the
inadvertent customer data leakage. The unique approach to security awareness is not
necessarily the right approach for all SaaS organizations; an information security
awareness and training program that adapts the information and training to the person's
role in the organisations, is more important. For example, development engineers can
receive security awareness in the form of a Secure Code and Testing Training while data
privacy and security certification training can be provided to customer service
representatives. An ideal approach should be used for both generic and personal
purposes.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 26 Resource Management and Security in Cloud
project management that result in a project not being completed and never realizing its
expected returns. There are excessive and unrealistic workloads expectations occur
because projects are not prioritized in accordance with policy, goals and resort ability.
The security team should ensure that the project plan and project manager with
appropriate training with experience are in place for each new project being conducted by
a security teams so that the project can be seen through to completion. The development
of methodology, tools and processes that support the expected complexity of projects for
both traditional business practice and cloud-based approach can be enhanced by portfolio
and project management capabilities.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 27 Resource Management and Security in Cloud
recover from failure, but can also reduce the overall complexity, costs and risk of
managing your most critical applications on regular basis. There are also drastic prospects
in the cloud for cost-effective BC/DR solutions.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 29 Resource Management and Security in Cloud
The Secure Software Development Life Cycle consists of six phases which are shown
in Fig. 4.7.1 and described as follows.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 30 Resource Management and Security in Cloud
I. Initial Investigation : To define and document project processes and goals in the
security policy of the program.
II. Requirement Analysis : To analyze recent security policies and systems,
assessment of emerging threats and controls, study of legal issues, and perform
risk analysis.
III. Logical design : To develop a security plan; planning for incident response
measures; business responses to disaster; and determine whether the project can
be carried on and/or outsourced.
IV. Physical design : Selecting technologies to support the security plan, developing a
solution definition that is successful, designing physical security measures to
support technological solutions and reviewing and approving plans.
V. Implementation : Purchase or create solutions for security. Submit a tested
management package for approval at the end of this stage.
VI. Maintenance : The application code can be monitored, tested, and maintained
continuously for efficient enhancement. Further additional security processes
have been developed in order to support the development of application projects
such as external and internal penetration testing and standard security
requirements for data classification. Formally training and communication should
also be introduced to raise awareness about process improvement.
4.9 IAM
The Identity and Access Management (IAM) are the vital function for every
organisations, and SaaS customers have a fundamental expectation that their data is given
the principle of the least privilege. The privilege principle says that only the minimum
access is required to perform an operation that should be granted access only for the
minimum amount of time required. Aspects of current models including trust principles,
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 32 Resource Management and Security in Cloud
The IAM architecture is made up of several processes and activities (see Fig. 4.9.2). The
processes supported by IAM are given as follows.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 33 Resource Management and Security in Cloud
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 34 Resource Management and Security in Cloud
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 35 Resource Management and Security in Cloud
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 37 Resource Management and Security in Cloud
TLS authentication is a single way, since the client knows the identity of the server
already. The client is not authenticated in this case. This means that on browser level, the
browser specifically validated the server’s certificate and checked the digital signatures of
the server certificate issuing chain of Certification Authorities (CAs). No validation
identifies the end user's server. The end user must verify the identifying information
contained in the certificate of the server in order to be truly identifiable. That is the only
way for end users to be aware of the server's "identity," and it is the only way to securely
establish the identify, to check that the server's certificate specifies the URL, name or
address that they are using in the server's certificate. The valid certificate of another
website cannot be used by malicious websites, because they have no means of encrypting
the transmission in a way to decrypt it with a true certificate. Since only a trustworthy CA
can incorporate a URL in a certificate, this makes sure that it is appropriate to compare
the apparent URL with the URL specified in the certificate. A more secure bilateral
connection mode is also supported by TLS ensuring that both ends of the connection
communicate with the individual they believe is connected. This is called mutual
authentication. The TLS client side must also keep a certificate for mutual authentication.
Three basic phases involve TLS are Algorithm support for pair negotiation involves
cipher suites that are negotiated between the client and the server to determine the
ciphers being used; Authentication and key exchange involves decisions on
authentication algorithms and key exchange to be used. Here key exchange and
authentication algorithms are public key algorithms; and Message authentication using
Symmetric cipher encryption determines the message authentication codes. The
Cryptographic hash functions are used for message authentication codes. Once these
decisions are made, the transfer of data can be commenced.
Summary
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 38 Resource Management and Security in Cloud
The provisioning of storage resources in cloud is often associated with the terms
like distributed file system, storage technologies and databases.
There are three methods of resource provisioning namely Demand-Driven,
Event-Driven and Popularity-Driven.
The cloud providers can expand or redimension their provision capacity in a
competitive and dynamic manner by leasing the computation and storage
resources of other cloud service providers with the use of Intercloud
architectural principles.
Although cloud computing has many benefits in most of the aspects, but
security issues in cloud platforms led many companies to hesitate to migrate
their essential resources to the cloud.
Even if cloud computing and virtualization can enhance business efficiency by
breaking the physical ties between an IT infrastructure and its users, it is
important to resolve increased security threats in order to fully benefit from this
new computing paradigms.
Some security issues in cloud platforms are trust, privacy, lack of security and
copyright protection.
Key privacy issues in the cloud computing are Compliance issue, Storage
concern, Retention concern, Access Concern, Auditing and monitoring and so
on.
The lack of a formalized strategy can lead to the development of an
unsupportable operating model and the level of security.
The essential factors required in security governance are Risk Assessment and
management, Security Awareness, Security Portfolio Management, Security
Standards, Guidelines and Policies, Security Monitoring and Incident Response,
Business Continuity Plan and Disaster Recovery and so on.
To overcome the security attacks on VMs, Network level IDS or Hardware level
IDS can be used for protection, shepherding programs can be applied for code
execution control and verification and additional security technologies can be
used.
The Identity and access management is the security framework composed of
policy and governance components used for creation, maintenance and
termination of digital identities with controlled access of shared resources. It
composed of multiple processes, components, services and standard practices.
Security standards are needed to define the processes, measures and practices
required to implement the security program in a web or network environment.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 39 Resource Management and Security in Cloud
Q.1 List any four host security threats in public IaaS. AU : Dec.-17
Ans. : The most common host security threats in public IaaS public cloud are
Hijacking of accounts those are not properly secured.
Stealing the keys like SSH private keys those are used to access and manage hosts.
Attacking unpatched and vulnerable services by listening on standard ports like FTP,
NetBIOS, SSH.
Attacking systems that are not secured by host firewalls.
Deploying Trojans embedded viruses in the software’s running inside the VM.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 41 Resource Management and Security in Cloud
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 42 Resource Management and Security in Cloud
Q.1 “In today’s world, infrastructure security and data security are highly challenging at
network, host and application levels”, Justify and explain the several ways of protecting the
data at transit and at rest. AU : May-18
Ans. : Refer section 4.4.1 to 4.4.4.
Q.2 Explain the baseline Identity and access management (IAM) factors to be practices by
the stakeholders of cloud services and the common key privacy issues likely to happen in the
cloud environment. AU : May-18
Ans. : Refer section 4.9 for Identity and access management and 4.5.1 for the common
key privacy issues likely to happen in the environment.
Q.3 What is the purpose of IAM ? Describe its functional architecture with an illustration.
AU : Dec.-17
Ans. : Refer section 4.9.
Q.4 Write details about cloud security infrastructure. AU : Dec.-16
Ans. : Refer section 4.4.
Q.5 Write detailed note on identity and access management architecture. AU : May-17
Ans. : Refer section 4.9.
Q.6 Describe the IAM practices in SaaS, PaaS and IaaS availability in cloud. AU : Dec.-19
Ans. : Refer section 4.9.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 43 Resource Management and Security in Cloud
Q.7 How is the identity and access management established in cloud to counter threats ?
AU : May-19
Ans. : Refer section 4.9.
Q.8 Write detailed note on Resource Provisioning and Resource Provisioning Methods
Ans. : Refer section 4.2.
Q.9 How Security Governance can be achieved in cloud computing environment
Ans. : Refer section 4.7.
Q.10 Explain different Security Standards used in cloud computing
Ans. : Refer section 4.10.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
5
Cloud Technologies and
Advancements
Syllabus
Hadoop – MapReduce – Virtual Box -- Google App Engine – Programming Environment for
Google App Engine - Open Stack – Federation in the Cloud – Four Levels of Federation –
Federated Services and Applications – Future of Federation.
Contents
5.1 Hadoop
5.2 Hadoop Distributed File system (HDFS)
5.3 Map Reduce
5.4 Virtual Box
5.5 Google App Engine
5.6 Programming Environment for Google App Engine
5.7 Open Stack
5.8 Federation in the Cloud
5.9 Four Levels of Federation
5.10 Federated Services and Applications
5.11 The Future of Federation
(5 - 1)
Cloud Computing 5-2 Cloud Technologies and Advancements
5.1 Hadoop
With the evolution of internet and related technologies, the high computational power,
large volumes of data storage and faster data processing becomes the basic need for most
of the organizations and it has been significantly increased over the period of time.
Currently, organizations are producing huge amount of data at faster rate. The recent
survey on data generation by various organization says that Facebook produces roughly
600+ TBs of data per day and analyzes 30+ Petabytes of user generated data, Boeing jet
airplane generates more than 10 TBs of data per flight including geo maps, special images
and other information, Walmart handles more than 1 million customer transactions every
hour, which is imported into databases estimated to contain more than 2.5 petabytes of
data.
So, there is a need to acquire, analyze, process, handle and store such a huge amount
of data called big data. The different challenges associated with such big data are given as
below
a) Volume : The Volume is related to Size of big data. The amount of data growing
day by day is very huge. According to IBM, in the year 2000, 8 lakh petabytes of
data were stored in the world.so challenge here is, how to deal with such huge Big
Data.
b) Variety : The Variety is related to different formats of big data. Nowadays most of
the data stored by organizations have no proper structure called unstructured data.
Such data has complex structure and cannot be represented using rows and
columns. The challenge here is how to store different formats of data in databases.
c) Velocity : The Velocity is related to speed of data generation which is very fast. It is
a rate at which data is captured, generated and shared. The challenge here is how to
react to massive information generated in the time required by the application.
d) Veracity : The Veracity refers to uncertainty of data. The data stored in database
sometimes is not accurate or consistent that makes poor data quality. The
inconsistent data requires lot of efforts to process such data.
The traditional database management techniques are incapable to satisfy above four
characteristics as well as doesn’t supports storing, processing, handling and analyzing big
data. Therefore, these challenges associated with Big data can be solved using one of the
most popular framework provided by Apache is called Hadoop.
The Apache Hadoop is an open source software project that enables distributed
processing of large data sets across clusters of commodity servers using programming
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5-3 Cloud Technologies and Advancements
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5-4 Cloud Technologies and Advancements
5) Pig It is a platform used for analyzing the large data sets using a
high-level language. It uses dataflow language and provides
parallel execution framework.
As we know that from all above components, HDFS and MapReduce are the two core
components of Hadoop framework which are explained in next sections.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5-5 Cloud Technologies and Advancements
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5-6 Cloud Technologies and Advancements
1. Name Node
An HDFS cluster consists of single name node called master server that manages the
file system namespaces and regulate access to files by client. It runs on commodity
hardware that manages file system namespaces. It stores all metadata for the file system
across the clusters. The name node serves as single arbitrator and repository for HDFS
metadata which is kept in main memory for faster random access. The entire file system
name space is contained in a file called FsImage stored on name nodes file system, while
the transaction log record is stored in Editlog file.
2. Data Node
In HDFS there are multiple data nodes exist that manages storages attached to the
node that they run on. They are usually used to store users’ data on HDFS clusters.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5-7 Cloud Technologies and Advancements
Internally the file is splitted in to one or more blocks to data node. The data nodes are
responsible for handling read/write request from clients. It also performs block creation,
deletion and replication upon instruction from name node. The data node stores each
HDFS data block in separate file and several blocks are stored on different data nodes.
The requirement of such a block structured file system is to store, manage and access files
metadata reliably.
The representation of name node and data node is shown in Fig. 5.2.2.
3. HDFS Client
In Hadoop distributed file system, the user applications access the file system using
the HDFS client. Like any other file systems, HDFS supports various operations to read,
write and delete files, and operations to create and delete directories. The user references
files and directories by paths in the namespace. The user application does not need to
aware that file system metadata and storage are on different servers, or that blocks have
multiple replicas. When an application reads a file, the HDFS client first asks the name
node for the list of data nodes that host replicas of the blocks of the file. The client
contacts a data node directly and requests the transfer of the desired block. When a client
writes, it first asks the name node to choose data nodes to host replicas of the first block of
the file. The client organizes a pipeline from node-to-node and sends the data. When the
first block is filled, the client requests new data nodes to be chosen to host replicas of the
next block. The Choice of data nodes for each block is likely to be different.
4. HDFS Blocks
In general, the user’s data stored in HDFS in terms of block. The files in file system are
divided in to one or more segments called blocks. The default size of HDFS block is
64 MB that can be increase as per need.
The HDFS is fault tolerance such that if data node fails then current block write
operation on data node is re-replicated to some other node. The block size, number of
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5-8 Cloud Technologies and Advancements
replicas and replication factors are specified in Hadoop configuration file. The
synchronization between name node and data node is done by heartbeats functions
which are periodically generated by data node to name node.
Apart from above components the job tracker and task trackers are used when
MapReduce application runs over the HDFS. Hadoop core consists of one master job
tracker and several task trackers. The job tracker runs on name node like a master while
task trackers runs on data nodes like slaves.
The job tracker is responsible for taking the requests from a client and assigning task
trackers to it with tasks to be performed. The job tracker always tries to assign tasks to
the task tracker on the data nodes where the data is locally present. If for some reason the
node fails the job tracker assigns the task to another task tracker where the replica of the
data exists since the data blocks are replicated across the data nodes. This ensures that the
job does not fail even if a node fails within the cluster.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5-9 Cloud Technologies and Advancements
Every MapReduce program undergoes different phases of execution. Each phase has
its own significance in MapReduce framework. The different phases of execution in
MapReduce are shown in Fig. 5.3.2 and explained as follows.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 10 Cloud Technologies and Advancements
In input phase the large data set in the form of <key, value> pair is provided as a
standard input for MapReduce program. The input files used by MapReduce are kept on
HDFS (Hadoop Distributed File System) store which has standard InputFormat specified
by user.
Once input file is selected then the split phase reads the input data and divided those
in to smaller chunks. The splitted chunks are then given to the mapper. The map
operations extract the relevant data and generate intermediate key value pairs. It reads
input data from split using record reader and generates intermediate results. It is used to
transform the input key, value list data to output key, value list which is then pass to
combiner.
The combiner is used with both mapper and reducer to reduce the volume of data
transfer.it is also known as semi reducer which accepts input from mapper and passes
output key, value pair to reducer. The shuffle and sort are the components of reducer.
The shuffling is a process of partitioning and moving a mapped output to the reducer
where intermediate keys are assigned to the reducer. Each partition is called subset. Each
subset becomes input to the reducer.in general shuffle phase ensures that the partitioned
splits reached at appropriate reducers where reducer uses http protocol to retrieve their
own partition from mapper.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 11 Cloud Technologies and Advancements
The sort phase is responsible for sorting the intermediate keys on single node
automatically before they are presented to the reducer. The shuffle and sort phases occur
simultaneously where mapped output are being fetched and merged.
The reducer reduces a set of intermediate values which share unique keys with set of
values. The reducer uses sorted input to generate the final output. The final output is
written using record writer by the reducer in to output file with standard output format.
The final output of each MapReduce program is generated with key value pairs written in
output file which is written back to the HDFS store. In example of Word count process
using MapReduce with all phases of execution are illustrated in Fig. 5.3.3.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 13 Cloud Technologies and Advancements
Fig. 5.5.1 : Functional architecture of the Google cloud platform for app engine
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 14 Cloud Technologies and Advancements
The infrastructure for google cloud is managed inside datacenter. All the cloud
services and applications on Google runs through servers inside datacenter. Inside each
data center, there are thousands of servers forming different clusters. Each cluster can run
multipurpose servers. The infrastructure for GAE composed of four main components
like Google File System (GFS), MapReduce, BigTable, and Chubby. The GFS is used for
storing large amounts of data on google storage clusters. The MapReduce is used for
application program development with data processing on large clusters. Chubby is used
as a distributed application locking services while BigTable offers a storage service for
accessing structured as well as unstructured data. In this architecture, users can interact
with Google applications via the web interface provided by each application.
The GAE platform comprises five main components like
Application runtime environment offers a platform that has built-in execution
engine for scalable web programming and execution.
Software Development Kit (SDK) for local application development and
deployment over google cloud platform.
Datastore to provision object-oriented, distributed, structured data storage to store
application and data. It also provides secures data management operations based
on BigTable techniques.
Admin console used for easy management of user application development and
resource management
GAE web service for providing APIs and interfaces.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 15 Cloud Technologies and Advancements
GFS provides a file system interface and different APIs for supporting different file
operations such as create to create a new file instance, delete to delete a file instance, open to
open a named file and return a handle, close to close a given file specified by a handle,
read to read data from a specified file and write to write data to a specified file.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 16 Cloud Technologies and Advancements
It can be seen from Figure 5.6.1, that a single GFS Master and three chunk servers are
serving to two clients comprise a GFS cluster. These clients and servers, as well as the
Master, are Linux machines, each running a server process at the user level. These
processes are known as user-level server processes.
In GFS, the metadata is managed by the GFS Master that takes care of all the
communication between the clients and the chunk servers. Chunks are small blocks of
data that are created from the system files. Their usual size is 64 MB. The clients interact
directly with chunk servers for transferring chunks of data. For better reliability, these
chunks are replicated across three machines so that whenever the data is required, it can
be obtained in its complete form from at least one machine. By default, GFS stores three
replicas of the chunks of data. However, users can designate any levels of replication.
Chunks are created by dividing the files into fixed-sized blocks. A unique immutable
handle (of 64-bit) is assigned to each chunk at the time of their creation by the GFS
Master. The data that can be obtained from the chunks, the selection of which is specified
by the unique handles, is read or written on local disks by the chunk servers. GFS has all
the familiar system interfaces. It also has additional interfaces in the form of snapshots
and appends operations. These two features are responsible for creating a copy of files or
folder structure at low costs and for permitting a guaranteed atomic data-append
operation to be performed by multiple clients of the same file concurrently.
Applications contain a specific file system, Application Programming Interface (APIs)
that are executed by the code that is written for the GFS client. Further, the
communication with the GFS Master and chunk servers are established for performing
the read and write operations on behalf of the application. The clients interact with the
Master only for metadata operations. However, data-bearing communications are
forwarded directly to chunk servers. POSIX API, a feature that is common to most of the
popular file systems, is not included in GFS, and therefore, Linux vnode layer hook-in is
not required. Clients or servers do not perform the caching of file data. Due to the
presence of the streamed workload, caching does not benefit clients, whereas caching by
servers has the least consequence as a buffer cache that already maintains a record for
frequently requested files locally.
The GFS provides the following features :
Large - scale data processing and storage support
Normal treatment for components that stop responding
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 17 Cloud Technologies and Advancements
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 18 Cloud Technologies and Advancements
It is composed of three entities, namely Client, Big table master and Tablet servers. Big
tables are implemented over one or more clusters that are similar to GFS clusters. The
client application uses libraries to execute Big table queries on the master server. Big table
is initially broken up into one or more slave servers called tablets for the execution of
secondary tasks. Each tablet is 100 to 200 MB in size.
The master server is responsible for allocating tablets to tasks, clearing garbage
collections and monitoring the performance of tablet servers. The master server splits
tasks and executes them over tablet servers. The master server is also responsible for
maintaining a centralized view of the system to support optimal placement and load-
balancing decisions. It performs separate control and data operations strictly with tablet
servers. Upon granting the tasks, tablet servers provide row access to clients. Fig. 5.6.3
shows the structure of Big table :
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 19 Cloud Technologies and Advancements
Big Table is arranged as a sorted map that is spread in multiple dimensions and
involves sparse, distributed, and persistence features. The Big Table’s data model
primarily combines three dimensions, namely row, column, and timestamp. The first two
dimensions are string types, whereas the time dimension is taken as a 64-bit integer. The
resulting combination of these dimensions is a string type.
Each row in Big table has an associated row key that is an arbitrary string of up to
64 KB in size. In Big Table, a row name is a string, where the rows are ordered in a
lexicological form. Although Big Table rows do not support the relational model, they
offer atomic access to the data, which means you can access only one record at a time. The
rows contain a large amount of data about a given entity such as a web page. The row
keys represent URLs that contain information about the resources that are referenced by
the URLs.
The naming conventions that are used for columns are more structured than those of
rows. Columns are organized into a number of column families that logically groups the
data under a family of the same type. Individual columns are designated by qualifiers
within families. In other words, a given column is referred to use the syntax column_
family: optional_ qualifier, where column_ family is a printable string and qualifier is an
arbitrary string. It is necessary to provide an arbitrary name to one level which is known
as a column family, but it is not mandatory to give a name to a qualifier. The column
family contains information about the data type and is actually the unit of access control.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 20 Cloud Technologies and Advancements
Qualifiers are used for assigning columns in each row. The number of columns that can
be assigned in a row is not restricted.
The other important dimension that is assigned to Big Table is a timestamp. In Big
table, the multiple versions of data are indexed by timestamp for a given cell. The
timestamp is either related to real-time or can be an arbitrary value that is assigned by a
programmer. It is used for storing various data versions in a cell. By default, any new
data that is inserted into Big Table is taken as current, but you can explicitly set the
timestamp for any new write operation in Big Table. Timestamps provide the Big Table
lookup option that returns the specified number of the most recent values. It can be used
for marking the attributes of the column families. The attributes either retain the most
recent values in a specified number or keep the values for a particular time duration.
Big Table supports APIs that can be used by developers to perform a wide range of
operations such as metadata operations, read/write operations, or modify/update
operations. The commonly used operations by APIs are as follows:
Creation and deletion of tables
Creation and deletion of column families within tables
Writing or deleting cell values
Accessing data from rows
Associate metadata such as access control information with tables and column
families
The functions that are used for atomic write operations are as follows :
Set () is used for writing cells in a row.
DeleteCells () is used for deleting cells from a row.
DeleteRow() is used for deleting the entire row, i.e., all the cells from a row are
deleted.
It is clear that Big Table is a highly reliable, efficient, and fan system that can be used
for storing different types of semi-structured or unstructured data by users.
5.6.3 Chubby
Chubby is the crucial service in the Google infrastructure that offers storage and
coordination for other infrastructure services such as GFS and Bigtable. It is a coarse -
grained distributed locking service that is used for synchronizing distributed activities in
an asynchronous environment on a large scale. It is used as a name service within Google
and provides reliable storage for file systems along with the election of coordinator for
multiple replicas. The Chubby interface is similar to the interfaces that are provided by
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 21 Cloud Technologies and Advancements
distributed systems with advisory locks. However, the aim of designing Chubby is to
provide reliable storage with consistent availability. It is designed to use with loosely
coupled distributed systems that are connected in a high-speed network and contain
several small-sized machines. The lock service enables the synchronization of the
activities of clients and permits the clients to reach a consensus about the environment in
which they are placed. Chubby’s main aim is to efficiently handle a large set of clients by
providing them a highly reliable and available system. Its other important characteristics
that include throughput and storage capacity are secondary. Fig. 5.6.4 shows the typical
structure of a Chubby system :
The chubby architecture involves two primary components, namely server and client
library. Both the components communicate through a Remote Procedure Call (RPC).
However, the library has a special purpose, i.e., linking the clients against the chubby cell.
A Chubby cell contains a small set of servers. The servers are also called replicas, and
usually, five servers are used in every cell. The Master is elected from the five replicas
through a distributed protocol that is used for consensus. Most of the replicas must vote
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 22 Cloud Technologies and Advancements
for the Master with the assurance that no other Master will be elected by replicas that
have once voted for one Master for a duration. This duration is termed as a Master lease.
Chubby supports a similar file system as Unix. However, the Chubby file system is
simpler than the Unix one. The files and directories, known as nodes, are contained in the
Chubby namespace. Each node is associated with different types of metadata. The nodes
are opened to obtain the Unix file descriptors known as handles. The specifiers for
handles include check digits for preventing the guess handle for clients, handle sequence
numbers, and mode information for recreating the lock state when the Master changes.
Reader and writer locks are implemented by Chubby using files and directories. While
exclusive permission for a lock in the writer mode can be obtained by a single client, there
can be any number of clients who share a lock in the reader’s mode. The nature of locks is
advisory, and a conflict occurs only when the same lock is requested again for an
acquisition. The distributed locking mode is complex. On one hand, its use is costly, and
on the other hand, it only permits numbering the interactions that are already using locks.
The status of locks after they are acquired can be described using specific descriptor
strings called sequencers. The sequencers are requested by locks and passed by clients to
servers in order to progress with protection.
Another important term that is used with Chubby is an event that can be subscribed
by clients after the creation of handles. An event is delivered when the action that
corresponds to it is completed. An event can be :
a. Modification in the contents of a file
b. Addition, removal, or modification of a child node
c. Failing over of the Chubby Master
d. Invalidity of a handle
e. Acquisition of lock by others
f. Request for a conflicting lock from another client
In Chubby, caching is done by a client that stores file data and metadata to reduce the
traffic for the reader lock. Although there is a possibility for caching of handles and files
locks, the Master maintains a list of clients that may be cached. The clients, due to
caching, find data to be consistent. If this is not the case, an error is flagged. Chubby
maintains sessions between clients and servers with the help of a keep-alive message,
which is required every few seconds to remind the system that the session is still active.
Handles that are held by clients are released by the server in case the session is overdue
for any reason. If the Master responds late to a keep-alive message, as the case may be, at
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 23 Cloud Technologies and Advancements
times, a client has its own timeout (which is longer than the server timeout) for the
detection of the server failure.
If the server failure has indeed occurred, the Master does not respond to a client about
the keep-alive message in the local lease timeout. This incident sends the session in
jeopardy. It can be recovered in a manner as explained in the following points:
The cache needs to be cleared.
The client needs to wait for a grace period, which is about 45 seconds.
Another attempt is made to contact the Master.
If the attempt to contact the Master is successful, the session resumes and its jeopardy
is over. However, if this attempt fails, the client assumes that the session is lost. Fig. 5.6.5
shows the case of the failure of the Master :
Chubby offers a decent level of scalability, which means that there can be any
(unspecified) number of the Chubby cells. If these cells are fed with heavy loads, the lease
timeout increases. This increment can be anything between 12 seconds and 60 seconds.
The data is fed in a small package and held in the Random-Access Memory (RAM) only.
The Chubby system also uses partitioning mechanisms to divide data into smaller
packages. All of its excellent services and applications included, Chubby has proved to
be a great innovation when it comes to storage, locking, and program support services.
The Chubby is implemented using the following APls :
1. Creation of handles using the open() method
2. Destruction of handles using the close() method
The other important methods include GetContentsAndStat(), GetStat(), ReadDir(),
SetContents(), SetACl(), Delete(), Acquire(), TryAcquire(), Release(), GetSequencer(),
SetSequencer(), and CheckSequencer(). The commonly used APIs in chubby are listed in
Table 5.6.1 :
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 24 Cloud Technologies and Advancements
API Description
Open Opens the file or directory and returns a handle
Close Closes the file or directory and returns the associated handle
GetContentsAndStat Writes the file contents and return metadata associated with the file
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 25 Cloud Technologies and Advancements
Swift : Swift provides storage services for storing files and objects. Swift can be
equated with Amazon’s Simple Storage System (S3).
Cinder : This component provides block storage to Nova Virtual Machines. Its
working is similar to a traditional computer storage system where the computer is
able to access specific locations on a disk drive. Cinder is analogous to AWS’s EBS.
Glance : Glace is OpenStack’s image service component that provides virtual
templates (images) of hard disks. These templates can be used for new VMs. Glance
may use either Swift or flat files to store these templates.
Neutron (formerly known as Quantum) : This component of OpenStack provides
Networking-as-a- Service, Load-Balancer-as-a-Service and Firewall- as-a-Service. It
also ensures communication between other components.
Heat : It is the orchestration component of OpenStack. It allows users to manage
infrastructural needs of applications by allowing the storage of requirements in
files.
Keystone : This component provides identity management in OpenStack
Horizon : This is a dashboard of OpenStack, which provides a graphical interface.
Ceilometer : This component of OpenStack provisions meters and billing models
for users of the cloud services. It also keeps an account of the resources used by
each individual user of the OpenStack cloud. Let us also discuss some of the non-
core components of OpenStack and their offerings.
Trove : Trove is a component of OpenStack that provides Database-as-a- service. It
provisions relational databases and big data engines.
Sahara : This component provisions Hadoop to enable the management of data
processors.
Zaqar : This component allows messaging between distributed application
components.
Ironic : Ironic provisions bare-metals, which can be used as a substitute to VMs.
The basic architectural components of OpenStack, shown in Fig. 5.7.1, includes its core
and optional services/ components. The optional services of OpenStack are also known as
Big Tent services, and OpenStack can be used without these components or they can be
used as per requirement.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 27 Cloud Technologies and Advancements
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 28 Cloud Technologies and Advancements
Compatibility : OpenStack supports both private and public clouds and is very easy
to deploy and manage. OpenStack APIs are supported in Amazon Web Services. The
compatibility eliminates the need for rewriting applications for AWS, thus enabling
easy portability between public and private clouds.
Security : OpenStack addresses the security concerns, which are the top- most
concerns for most organisations, by providing robust and reliable security systems.
Real-time Visibility : OpenStack provides real-time client visibility to
administrators, including visibility of resources and instances, thus enabling
administrators and providers to track what clients are requesting for.
Live Upgrades : This feature allows upgrading services without any downtime.
Earlier, for upgradations, the was a need for shutting-down complete systems,
which resulted in loss of performance. Now, OpenStack has enabled upgrading
systems while they are running by requiring only individual components to shut-
down.
Apart from these, OpenStack offers other remarkable features, such as networking,
compute, Identity Access Management, orchestration, etc.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 29 Cloud Technologies and Advancements
From Fig. 5.7.2, we can see that every service of OpenStack depends on other services
within the systems, and all these services exist in a single ecosystem working together to
produce a virtual machine. Any service can be turned on or off depending on the VM
required to be produced. These services communicate with each other through APIs and
in some cases through privileged admin commands.
Let us now discuss the relationship between various components or services specified
in the conceptual architecture of OpenStack. As you can see in Figure 4.2, three
components, Keystone, Ceilometer and Horizon, are shown on top of the OpenStack
platform.
Here, Horizon is providing user interface to the users or administrators to interact with
underlying OpenStack components or services, Keystone is providing authentication to
the user by mapping the central directory of users to the accessible OpenStack services,
and Ceilometer is monitoring the OpenStack cloud for the purpose of scalability, billing,
benchmarking, usage reporting and other telemetry services. Inside the OpenStack
platform, you can see that various processes are handled by different OpenStack services;
Glance is registering Hadoop images, providing image services to OpenStack and
allowing retrieval and storage of disk images. Glance stores the images in Swift, which is
responsible for providing reading service and storing data in the form of objects and files.
All other OpenStack components also store data in Swift, which also stores data or job
binaries. Cinder, which offers permanent block storage or volumes to VMs, also stores
backup volumes in Swift. Trove stores backup databases in Swift and boots databases
instances via Nova, which is the main computing engine that provides and manages virtual
machines using disk images.
Neutron enables network connectivity for VMs and facilitates PXE Network for Ironic
that fetches images via Glance. VMs are used by the users or administrators to avail and
provide the benefits of cloud services. All the OpenStack services are used by VMs in
order to provide best services to the users. The infrastructure required for running cloud
services is managed by Heat, which is the orchestration component of OpenStack that
orchestrates clusters and stores the necessarys resource requirements of a cloud
application. Here, Sahara is used to offer a simple means of providing a data processing
framework to the cloud users.
Table 5.7.1 shows the dependencies of these services.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 30 Cloud Technologies and Advancements
Keystone (Identity) - -
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 31 Cloud Technologies and Advancements
1. Jabber XCP
Instant Messaging (IM) allows users to exchange messages that are delivered
synchronously. As long as the recipient is connected to the service, the message will be
pushed to it directly. This can either be realized using a centralized server or peer to peer
connections between each client.
The Jabber Extensible Communications Platform (Jabber XCP) is a commercial IM
server, created by the Cisco in association with Sun Microsystems. It is a highly
programmable presence and messaging platform. It supports the exchange of
information between applications in real time. It supports multiple protocols such as
Session Initiation Protocol for Instant Messaging and Presence Leveraging Extensions
(SIMPLE) and Instant Messaging and Presence Service (IMPS). It is a highly
programmable platform and scalable solution, which makes it ideal for adding presence
and messaging to existing applications or services and for building next-generation,
presence - based solutions.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 32 Cloud Technologies and Advancements
In current scenario, XMPP and XCP are extensively used for federation in the cloud
due to its unique capabilities which were never before. The next sections of this chapter
explain the levels of federations along with their applications and services.
The verified federation works at level 2, runs above the permissive federation. In
this level, the server accepts a connection from a peer network server only when the
identity of peer is verified or validated. Peer verification is the minimum criteria to run
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 34 Cloud Technologies and Advancements
encryption. The trusted root CAs are identified based on one or more factors like their
operating system environment, XMPP server software, or local service policy. The
utilization of trusted domain certificates prevents DNS poisoning attacks but makes
federation more difficult. Under such circumstances, the certificates are difficult to obtain.
Here certificates are signed by CA.
intermediate links. With federation, the email network can ensures encrypted connections
and strong authentication because of using certificates issued by trusted root CAs.
The Apache Hadoop is an open source software project that enables distributed
processing of large data sets across clusters of commodity servers using
programming models.
The Hadoop core is divided into two fundamental layers called HDFS and
MapReduce engine. The HDFS is a distributed file system inspired by GFS that
organizes files and stores their data on a distributed computing system, while
MapReduce is the computation engine running on top of HDFS as its data
storage manager.
The HDFS follows Master-slave architecture using name node and data nodes.
The Name node act as a master while multiple Data nodes worked as slaves.
In MapReduce s model the data processing primitives used are called mapper
and reducer. The mapper has map method that transforms input key value pair
in to any number of intermediate key value pairs while reducer has a reduce
method that transform intermediate key value pairs that aggregated in to any
number of output key, value pairs.
VirtualBox is a Type II (hosted) hypervisor that runs on Microsoft Windows,
Mac OS, Linux, and Solaris systems. It is ideal for testing, developing,
demonstrating, and deploying solutions across multiple platforms on single
machine using VirtualBox.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 37 Cloud Technologies and Advancements
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 39 Cloud Technologies and Advancements
If after a certain time of heartbeat (which is ten minutes by default), Name Node
does not receive any response from Data Node, then that particular Data Node used to
be declared as dead. If the death of a node causes the replication factor of data blocks to
drop below their minimum value, the Name Node initiates additional replication to
normalized state.
Q.3 Name the different modules in Hadoop framework. AU : May-17
Ans. : The Hadoop core is divided into two fundamental modules called HDFS and
MapReduce engine. The HDFS is a distributed file system inspired by GFS that
organizes files and stores their data on a distributed computing system, while
MapReduce is the computation engine running on top of HDFS as its data storage
manager. Apart from that there are several other modules in Hadoop, used for data
storage, processing and analysis which are listed below :
a. HBase : Column-oriented NOSQL database service
b. Pig : Dataflow language and provides parallel data processing framework
c. Hive : Data warehouse infrastructure for big data
d. Scoop : Transferring bulk data between Hadoop and structured data stores
e. Oozie : Workflow scheduler system
f. Zookeeper : Distributed synchronization and coordination service
g. Mahaout : Machine learning tool for big data.
Q.4 “HDFS” is fault tolerant. Is it true ? Justify your answer. AU : Dec.-17
Ans. : Fault tolerance refers to the ability of the system to work or operate
uninterrupted even in case of unfavorable conditions (like components failure due to
disaster or by any other reason). The main purpose of this fault tolerance is to remove
frequently taking place failures, which occurs commonly and disturbs the ordinary
functioning of the system. The three main solutions which are used to produce fault
tolerance in HDFS are data replication, heartbeat messages and checkpoint and
recovery.
In data replication, The HDFS stores multiple replicas of same data across different
clusters based on replication factor. HDFS uses an intelligent replica placement model
for reliability and performance. The same copy of data is positioned on several different
computing nodes so when that data copy is needed it is provided by any of the data
node. major advantage of using this technique is to provide instant recovery from node
and data failures. But one main disadvantage is it consume high memory in storing the
same data on multiple nodes.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 40 Cloud Technologies and Advancements
In heartbeat messages, the message is sent by the data node to the name node in
regular time interval to indicate its presence, i.e. to indicate that it is alive. If after a
certain time of heartbeat, Name Node does not receive any response from Data Node,
then that particular Data Node used to be declared as dead. In that case, the replication
node is considered as a primary data node to recover the data.
In checkpoint and recovery, similar concept as that of rollback is used to tolerate
faults up to some point. After a fixed length of time interval, the copy report has been
saved and stored. It just rollbacks to the last save point when the failure occurs and
then it starts performing transaction again.
Q.5 How does divide and conquer strategy related to MapReduce paradigm ?
AU : May-18
Ans. : In divide and conquer strategy a computational problem is divided into smaller
parts and execute them independently
until all parts gets completed and then
combining them to get a desired
solution of that problem.
The MapReduce takes a set of input
<key, value> pairs and produces a set
of output <key, value> pairs by
supplying data through map and
Fig. 5.2 : MapReduce operations
reduce functions. The typical
MapReduce operations are shown in Fig. 5.2.
In MapReduce, Mapper uses divide approach where input data gets splitted into
blocks; each block is represented as an input key and value pair. The unit of work in
MapReduce is a job. During map phase the input data is divided in to input splits for
analysis where each split is an independent task. These tasks run in parallel across
Hadoop clusters. A map function is applied to each input key/value pair, which does
some user-defined processing and emits new key/value pairs to intermediate storage
to be processed by the reducer. The reducer uses conquer approach for combining the
results. The reducer phase uses result obtained from mapper as an input to generate the
final result. A reduce function is applied on to mappers output in parallel to all values
corresponding to each unique map key and generates a single output key/value pair.
Q.6 How MapReduce framework executes user jobs ? AU : Dec.-18
Ans. : The unit of work in MapReduce is a job. During map phase the input data is
divided in to input splits for analysis where each split is an independent task. These
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 41 Cloud Technologies and Advancements
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 42 Cloud Technologies and Advancements
Features of MapReduce
The different features provided by MapReduce are explained as follows :
Synchronization : The MapReduce supports execution of concurrent tasks. When
the concurrent tasks are executed, they need synchronization. The
synchronization is provided by reading the state of each MapReduce operation
during the execution and uses shared variables for those.
Data locality : In MapReduce although the data resides on different clusters, it
appears like a local to the users’ application. To obtain the best result the code and
data of application should resides on same machine.
Error handling : MapReduce engine provides different fault tolerance
mechanisms in case of failure. When the tasks are running on different cluster
nodes during which if any failure occurs then MapReduce engine find out those
incomplete tasks and reschedule them for execution on different nodes.
Scheduling : The MapReduce involves map and reduce operations that divide
large problems in to smaller chunks and those are run in parallel by different
machines.so there is a need to schedule different tasks on computational nodes on
priority basis which is taken care by MapReduce engine.
Q.8 Enlist the features of Virtual Box.
Ans. : The VirtualBox provides the following main features
It supports Fully Para virtualized environment along with Hardware
virtualization.
It provides device drivers from driver stack which improves the performance of
virtualized input/output devices.
It provides shared folder support to copy data from host OS to guest OS and vice
versa.
It has latest Virtual USB controller support.
It facilitates broad range of virtual network driver support along with host, bridge
and NAT modes.
It supports Remote Desktop Protocol to connect windows virtual machine (guest
OS) remotely on a thin, thick or mobile client seamlessly.
It has Support for Virtual Disk formats which are used by both VMware and
Microsoft Virtual PC hypervisors.
Q.9 Describe google app engine.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 43 Cloud Technologies and Advancements
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 44 Cloud Technologies and Advancements
distribute workload around the globe and move data between desperate networks and
implement innovative security models for user access to cloud resources. In federated
clouds, the cloud resources are provisioned through network gateways that connect
public or external clouds with private or internal clouds owned by a single entity
and/or community clouds owned by several co-operating entities.
Q.15 Mention the importance of Transport Level Security (TLS). AU : Dec.-16
Ans. : The Transport Layer Securities (TLS) are designed to provide security at the
transport layer. TLS was derived from a security protocol called Secure Service Layer
(SSL). TLS ensures that no third party may eavdrops or tamper with any message.
The benefits of TLS are :
a. Encryption : TLS/SSL can help to secure transmitted data using encryption.
b. Interoperability : TLS/SSL works with most web browsers, including Microsoft
Internet Explorer and on most operating systems and web servers.
c. Algorithm Flexibility : TLS/SSL provides operations for authentication
mechanism, encryption algorithms and hashing algorithm that are used during
the secure session.
d. Ease of Deployment : Many applications TLS/SSL temporarily on a windows
server 2003 operating systems.
e. Ease of Use : Because we implement TLS/SSL beneath the application layer, most
of its operations are completely invisible to client.
Q.16 Enlist the features of extensible messaging & presence protocol for cloud
computing.
Ans. : The features of extensible messaging & presence protocol for cloud computing
are :
a. It is decentralized and supports easy two-way communication.
b. It doesn’t require polling for synchronization.
c. It has built-in publish subscribe (pub-sub) functionality.
d. It works on XML based open standards.
e. It is perfect for instant messaging features and custom cloud services.
f. It is efficient and scales up to millions of concurrent users on a single service.
g. It supports worldwide federation models
h. It provides strong security using Transport Layer Security (TLS) and Simple
Authentication and Security Layer (SASL).
i. It is flexible and extensible.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 46 Cloud Technologies and Advancements
Q.3 Explain the Hadoop Distributed File System architecture with a diagram.
AU : Dec.-18
Ans. : Refer section 5.2 and 5.2.1.
Q.4 Elaborate HDFS concepts with suitable diagram. AU : May-17
Ans. : Refer section 5.2 and 5.2.1.
OR Illustrate the design of Hadoop file system. AU : Dec.-19
Ans. : Refer section 5.2 and 5.2.1.
Q.5 Illustrate dataflow in HDFS during file read/write operation with suitable
diagrams. AU : Dec.-17
Ans. : The HDFS follows Master-slave architecture using name node and data nodes.
The Name node act as a master while multiple Data nodes worked as slaves. The HDFS
is implemented as block structure file system where files are broken in to block of fixed
size stored on Hadoop clusters. The HDFS architecture is shown in Fig. 5.4.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 47 Cloud Technologies and Advancements
across the clusters. The name node serves as single arbitrator and repository for HDFS
metadata which is kept in main memory for faster random access. The entire file system
name space is contained in a file called FsImage stored on name nodes file system,
while the transaction log record is stored in Editlog file.
2. Data Node
In HDFS there are multiple data nodes exist that manages storages attached to the
node that they run on. They are usually used to store users’ data on HDFS clusters.
Internally the file is splitted in to one or more blocks to data node. The data nodes are
responsible for handling read/write request from clients. It also performs block
creation, deletion and replication upon instruction from name node. The data node
stores each HDFS data block in separate file and several blocks are stored on different
data nodes. The requirement of such a block structured file system is to store, manage
and access files metadata reliably.
The representation of name node and data node is shown in Fig. 5.5.
3. HDFS Client
In Hadoop distributed file system, the user applications access the file system using
the HDFS client. Like any other file systems, HDFS supports various operations to read,
write and delete files, and operations to create and delete directories. The user
references files and directories by paths in the namespace. The user application does
not need to aware that file system metadata and storage are on different servers, or that
blocks have multiple replicas. When an application reads a file, the HDFS client first
asks the name node for the list of data nodes that host replicas of the blocks of the file.
The client contacts a data node directly and requests the transfer of the desired block.
When a client writes, it first asks the name node to choose data nodes to host replicas of
the first block of the file. The client organizes a pipeline from node-to-node and sends
the data. When the first block is filled, the client requests new data nodes to be chosen
to host replicas of the next block. The Choice of data nodes for each block is likely to be
different.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 48 Cloud Technologies and Advancements
4. HDFS Blocks
In general, the user’s data stored in HDFS in terms of block. The files in file system
are divided in to one or more segments called blocks. The default size of HDFS block is
64 MB that can be increase as per need.
A. Read Operation in HDFS
The Read Operation in HDFS is shown in Fig. 5.6 and explained as follows.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 49 Cloud Technologies and Advancements
6. Once the end of a block is reached, DFSInputStream closes the connection and
moves on to locate the next Data Node for the next block
7. Once a client has done with the reading, it calls a close () method.
The Write Operation in HDFS is shown in Fig. 5.7 and explained as follows
1. A client initiates write operation by calling 'create ()' method of Distributed File
system object which creates a new file - Step no. 1 in the above diagram.
2. Distributed file system object connects to the Name Node using RPC call and
initiates new file creation. However, this file creates operation does not associate
any blocks with the file. It is the responsibility of Name Node to verify that the file
(which is being created) does not exist already and a client has correct permissions
to create a new file. If a file already exists or client does not have sufficient
permission to create a new file, then IOException is thrown to the client.
Otherwise, the operation succeeds and a new record for the file is created by the
Name Node.
3. Once a new record in Name Node is created, an object of type
FSDataOutputStream is returned to the client. A client uses it to write data into
the HDFS. Data write method is invoked (step 3 in the diagram).
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 50 Cloud Technologies and Advancements
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 51 Cloud Technologies and Advancements
Ans. : The MapReduce takes a set of input <key, value> pairs and produces a set of
output <key, value> pairs by supplying data through map and reduce functions. Every
MapReduce program undergoes different phases of execution. Each phase has its own
significance in MapReduce framework. The different phases of execution in
MapReduce are shown in Fig. 5.8 and explained as follows.
Let us take an example of Word count application where inputs is an set of words.
The Input to the mapper has three sets of words like [Deer, Bear, River], [Car, Car,
River] and [Deer, Car, Bear]. These three sets are taken arbitrarily as an input to the
MapReduce process. The various stages in MapReduce for wordcount application are
shown in Fig. 5.9.
In input phase the large data set in the form of <key, value> pair is provided as a
standard input for MapReduce program. The input files used by MapReduce are kept
on HDFS (Hadoop Distributed File System) store which has standard InputFormat
specified by user.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 52 Cloud Technologies and Advancements
Once input file is selected then the splitting phase reads the input data and divided
those in to smaller chunks. Like [Deer, Bear, River], [Car, Car, River] and [Deer, Car,
Bear] as a separate set. The splitted chunks are then given to the mapper.
The mapper does map operations extract the relevant data and generate
intermediate key value pairs. It reads input data from split using record reader and
generates intermediate results like [Deer:1; Bear:1; River:1], [Car:1; Car:1; River:1] and
[Deer:1; Car:1; Bear:1]. It is used to transform the input key, value list data to output
key, value list which is then pass to combiner.
The shuffle and sort are the components of reducer. The shuffling is a process of
partitioning and moving a mapped output to the reducer where intermediate keys are
assigned to the reducer. Each partition is called subset. Each subset becomes input to
the reducer.in general shuffle phase ensures that the partitioned splits reached at
appropriate reducers where reducer uses http protocol to retrieve their own partition
from mapper. The output of this stage would be [Deer:1, Deer:1], [Bear:1, Bear:1],
[River:1, River:1] and [Car:1, Car:1, Car:1].
The sort phase is responsible for sorting the intermediate keys on single node
automatically before they are presented to the reducer. The shuffle and sort phases
occur simultaneously where mapped output are being fetched and merged. It sorts all
intermediate results alphabetically like [Bear:1, Bear:1], [Car:1, Car:1, Car:1], [Deer:1,
Deer:1] and [River:1, River:1]. The combiner is used with both mapper and reducer to
reduce the volume of data transfer.it is also known as semi reducer which accepts input
from mapper and passes output key, value pair to reducer. Then output of this stage
would be [Bear:2], [Car:3], [Deer:2] and [River:2].
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 53 Cloud Technologies and Advancements
The reducer reduces a set of intermediate values which share unique keys with set of
values. The reducer uses sorted input to generate the final output. The final output is
written using record writer by the reducer in to output file with standard output format
like [Bear:2, Car:3, Deer:2, River:2]. The final output of each MapReduce program is
generated with key value pairs written in output file which is written back to the HDFS
store. In example of Word count process using MapReduce with all phases of execution
are illustrated in Fig. 5.9.
Q.8 Explain the functional architecture of the Google cloud platform for app
engine in detail.
Ans. : Refer section 5.5.
Q.11 Explain the significance of Big table along with its working.
Ans. : Refer section 5.6.2.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge