0% found this document useful (0 votes)
48 views122 pages

Unit - 4-Cloud

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 122

Dr.

AMBEDKAR INSTITUTE OF TECHNOLOGY


OUTER RING ROAD,MALLATHALLI,BENGALURU
DEPARTMENT OF COMPUTER SCIECNE AND ENGINEERING

Cloud Computing –Kai Hwang


Unit 4

Features of Cloud and Grid Platforms


Programming Support of Google APP Engine:
Programming on Amazon AWS and Microsoft AZURE
Open Source cloud models

https://drait.edu.in 1
Features of Cloud and Grid Platforms: Cloud Capabilities and Platform
Features, Traditional Features Common to Grids and Clouds, Data Features
and Databases, Programming and Runtime Support, Programming Support
of Google APP Engine: Programming the Google App Engine, Google File
System (GFS), BigTable, Google’s NOSQL System, Chubby, Google’s
Distributed Lock Service, Programming on Amazon AWS and Microsoft
AZURE: Programming on Amazon EC2, Amazon Simple Storage Service (S3),
Amazon Elastic Block Store (EBS) and SimpleDB, Microsoft Azure
Programming Support, Emerging Cloud Software Environments: Open
Source Eucalyptus and Nimbus, OpenNebula, Sector/Sphere, and
OpenStack, Manjrasoft Aneka Cloud and Appliances.
FEATURES OF CLOUD AND GRID PLATFORMS

1.Cloud Capabilities and Platform Features


1.cost-effective utility computing with the elasticity to scale up and down in
power.
2.Azure- Paas- Azure Table, queues, blobs, Database SQL, and web
and Worker roles.
3.Amazon IaaS -- SimpleDB, queues, notification, monitoring, content
delivery network, relational database, and MapReduce (Hadoop).
4. Google--GAE powerful web application development environment
Important cloud platform capabilities
 Physical/virtual Platform
 Massive data storage ,Distributed File
System
Workflow and data query language
support
Programming interface and service
deployment
Runtime Support and services
Infrastructure Cloud Services
1.Accounting
2.Appliances
3.Authentication and Authorization
4.Data transport
5.Operating system
6.Program Library
7.Registry
8.Security
9.Scheduling
10.Gang Scheduling
11.SAAS
12.Virtualization.
2. Traditional Features Common to Grids and Clouds

1.Cluster Management
2.Data Management
3.Grid programming
environment
4.Open MP/Threading
5.Portals
6.Scalable parallel computing
environments
7.Virtual organizations
8.Workflow
2.Traditional Features Common to Grids and Clouds
1.Workflow-platform that executes multiple cloud and non-cloud
services in real applications on demand
2.Data Transport-high-bandwidth links will be made available between
clouds and TeraGrid.
3. Security, Privacy, and Availability

• Use special APIs for authenticating users and sending e-mail using commercial accounts.
• Cloud resources are accessed with security protocols such as HTTPS and SSL.
• Fine-grained access control is desired to protect data integrity and deter intruders or hackers.
• Shared data sets are protected from malicious alteration, deletion, or copyright violations.
• availability enhancement and disaster recovery with life migration of VMs.
• Use a reputation system to protect data centers.
3.Data Features and Databases
1. Program Library-VM image Library for academic and business clouds.
2 Blobs and Drives The basic storage -blobs for Azure and S3 for Amazon.
3. DPFS(Data protection for secure cloud)-Google File System (MapReduce), HDFS
(Hadoop)
4.SQL and Relational Databases-Bigtable and Mapreduce
5. Table and NOSQL Nonrelational Databases-BigTable in Google, SimpleDB in
Amazon, and Azure Table for Azure
6. Queuing Services-robust to communicate between the components of an
application.
 Short messages < 8 KB and have a Representational State Transfer (REST) interface
with “deliver at least once” semantics.
Google
Google
4.Programming and Runtime Support

1. Worker and Web Roles


Worker roles :schedulable processes and are automatically launched.
Web roles :approach to portals
2. MapReduce
 execute over different data samples.
 Hadoop in Amazon, and Dryad in Azure.
3. Cloud Programming Models
GAE-google
Grep: global search for the regular expression
Sed: streamline editor
Perl: report processing
Awk:pattern scanning and processing Google
Microsoft Dryad
PROGRAMMING SUPPORT OF GOOGLE APP ENGINE

GAE - languages: Java and Python.


A client environment includes an Eclipse plug-in for Java
allows to debug GAE on local machine.
 GWT (Google Web Toolkit) for Java web application
developers. Supports -JVM based interpreter or compiler.
 Python - Django and CherryPy-web applications.
Google
PROGRAMMING SUPPORT OF GOOGLE APP ENGINE

Powerful constructs for storing and accessing data.


The data store is a NOSQL data management system ,1 MB in
size and are labeled by a set of schema-less properties.
 Queries can retrieve entities of a given kind filtered and
sorted by the values of the properties.
PROGRAMMING SUPPORT OF GOOGLE APP ENGINE

Java offers Java Data Object (JDO) and Java Persistence API
(JPA) interfaces.
 Python uses SQL-like query language called GQL.
The data store is strongly consistent and uses optimistic
concurrency control.
PROGRAMMING SUPPORT OF GOOGLE APP ENGINE
1.Programming the Google App Engine

GAE application assign entities to groups when the entities


are created.
The performance of the data store : using the memcache,
Google uses Blobstore for large files with a limit of 2 GB.
Cron – used to configure regularly scheduled tasks.
Google
PROGRAMMING SUPPORT OF GOOGLE APP ENGINE
1.Programming the Google App Engine

 Google SDC (Secure Data Connection).


 URL Fetch : Ability for applications to fetch resources and
communicate with other hosts over the Internet using HTTP and
HTTPS requests.
 Specialized mail mechanism to send e-mail from GAE
application.
PROGRAMMING SUPPORT OF GOOGLE APP ENGINE
1.Programming the Google App Engine

An application use Google Accounts for user authentication.


Google Accounts handles user account creation and sign-in,
and a user that already has a Google account (such as a Gmail
account) can use that account with your app.
 GAE provides the ability to manipulate image data using
resize, rotate, flip, crop, and enhance images.
2.Google File System (GFS)

Storage service
Fundamental storage service for Google’s search engine.
Size of the web data that was crawled and saved is large.
Google uses distributed file system to redundantly store
massive amounts of data on cheap and unreliable
computers.
GFS was designed for Google applications, and Google
applications were built for GFS
2.Google File System (GFS)

File Size
GFS files, each 100MB or larger, with files that are multiple
GB in size.
 Google has chosen its file data block size to be 64MB.
 4 KB in typical traditional file systems
Google
2.Google File System (GFS)

I/O system
Files are written once, write operations appending data
blocks to the end of files.
Multiple appending operations will be concurrent.
Lot of large streaming reads and only a little random
access.
 Large streaming reads, highly sustained throughput
with low latency
2.Google File System (GFS)

Reliability
 Reliability is achieved by using replications (each chunk or
data block of a file is replicated across more than three
chunk servers).
 A single master coordinates access as well as keeps the
metadata.
2.Google File System (GFS)

 Single master in the whole cluster.


 Other nodes act as the chunk servers for storing data,
while the single master stores the metadata.
 Master- manages file system namespace and locking
facilities.
2.Google File System (GFS)

 The master periodically communicates with the chunk


servers
 Collect management information
 Give instructions to the chunk servers such as load
balancing or fail recovery.
 The master has enough information to keep the whole
cluster in a healthy state.
 Single master can handle a cluster of more than 1,000 nodes
Limitations
1.The master keep the whole cluster in a healthy state.
2. potential weakness-bottleneck and the single point of
failure.
2.Google File System (GFS)
Solution
 Google uses a shadow master to replicate all the data on
the master.
 Design guarantees that all the data operations are
performed directly between the client and the chunk server.
 The control messages are transferred between the
master and the clients and they can be cached for
future use.
Data mutation (write, append operations) in GFS
• Data blocks must be created for all
replicas.
• Goal is to minimize involvement of the
master
1.The client asks the master which chunk
server holds the current lease for the
chunk and the locations of the other
replicas.
 If no one has a lease, the master grants one to
a replica it chooses.

2. The master replies with the identity of


the primary and the locations of the other
(secondary) replicas.
 The client caches this data for future
mutations.
 It needs to contact the master again only
when the primary becomes unreachable or
replies that it no longer holds a lease.
3. The client pushes the data to all the
replicas.
 A client can do so in any order. Each
chunk server will store the data in an
internal LRU buffer cache until the data
is used or aged out.

4.Once all the replicas have


acknowledged receiving the data, the
client sends a write request to the
primary.
5. The primary forwards the write
request to all secondary replicas.
 Each secondary replica applies
mutations in the same serial
number order assigned by the
primary.

6. The secondaries all reply to the


primary indicating that they have
completed the operation.

7. The primary replies to the client.


3.BigTable, Google’s NOSQL System
1.BigTable -storing and retrieving structured and
semistructured data.
2. BigTable applications - storage of web pages,
per-user data, and geographic locations.
3. web pages URLs and their associated user preference
settings, recent queries/search results, and the user’s
e-mails.
3.BigTable, Google’s NOSQL System

4. Geographic locations : Google Earth software.


5.Geographic locations include physical entities
(shops, restaurants, etc.), roads, satellite image
data, and user annotations
Design and implementation of the BigTable system

The database - very high read/write rates and one-to-


one /one-to-many data sets.
The application may need to examine data changes
over time-updation.
It provides a fault-tolerant and persistent database
as in a storage service.
Design and implementation of the BigTable system

TB of memory data,


PB of disk-based data,
Millions of reads/writes per second, and efficient scans.
Self-managing,Scaling, automatic load balancing.
BigTable Google Search, Orkut, and Google Maps/Google
Earth.
BigTable cell manages ~200 TB of data.
Design and implementation of the BigTable system

BigTable uses the following building blocks:


1.GFS: stores persistent state
2.Scheduler: schedules jobs involved in BigTable
services.
3.Lock service: master election, location
bootstrapping.
4. MapReduce : read/write BigTable data
Tablet Location Hierarchy
1.Table each row  “Tablet”.
2. The first level is a file stored in Chubby that contains
the location of the root tablet.
3. Root tablet : Location of all tablets in a METADATA table.
4.Each METADATA tablet : Location of a set of user tablets.
5.The root tablet is just the first tablet in the METADATA table.
Chubby, Google’s Distributed Lock Service
1. coarse-grained locking service.
2.The files are small compared to the huge files in GFS.
Paxos - algorithm that is used to achieve consensus among a
distributed set of computers that communicate via an
asynchronous network.
3. Based on the Paxos agreement protocol, the Chubby
system is Reliable despite the failure of any member node.
Chubby, Google’s Distributed Lock Service

4.Clients use the Chubby library to talk to the servers in the


cell.
5.Client applications can perform various file operations on
any server in the Chubby cell.
6.Servers run the Paxos protocol to make file system
reliable and consistent.
Programming on Amazon EC2

First company to introduce VMs in application


hosting.
Customers can rent VMs instead of physical
machines to run their own applications.
By using VMs, customers can load any software of
their choice.
Programming on Amazon EC2

The elastic feature --create, launch, and


terminate server instances.
Amazon provides several types of preinstalled VMs.
 Instances are called Amazon Machine Images
(AMIs) which are preconfigured with operating
systems based on Linux or Windows, and additional
software
The workflow to create a VM is

sequence is supported by public, private, and paid AMIs


The AMIs are formed from the virtualized compute, storage, and server resources
Amazon Simple Storage Service (S3)

1.It provides a simple web services to store and retrieve


any amount of data, at any time, anywhere on the web.
2. Object-oriented storage service for users.
3. REST (web 2.0) interface, and SOAP interface.
Amazon Simple Storage Service (S3)
The fundamental Unit of S3 is called an object.
Each object is stored in a bucket and retrieved with unique,
developer-assigned key.
Bucket is the container of the object.
Object has values, metadata, and access control
information.
Amazon Simple Storage Service (S3)

 Programmer’s perspective, S3 is a coarse-grained key-


value pair.
write, read, and delete objects
 Key-value users can
containing from 1 byte to 5 gigabytes of data each.
Key features of S3:

Redundant through geographic dispersion.


 99.999999999% durability and 99.99% availability
of objects over a given year with cheaper reduced
redundancy storage (RRS).
Authentication Ensure that data is kept secure from
unauthorized access.
Objects-- private or public, and rights can be granted
to specific users.
Key features of S3
 Per-object URLs and ACLs (access control lists).
 Default download protocol of HTTP.
 $0.055 (more than 5,000 TB)(4.58 RS) to $0.15(12.5 Rs) per
GB per month storage (depending on total amount).
 No data transfer charge for data transferred between
Amazon EC2 and Amazon S3 within the same region
or for data transferred between the Amazon EC2
Amazon Elastic Block Store (EBS)

EBSBlock interface for saving and restoring the


virtual images of EC2 instances.
Traditional EC2 instances will be destroyed after use.
 The status of EC2 can now be saved in the EBS
system after the machine is shut down.
Amazon Elastic Block Store (EBS)

 Users can use EBS to save persistent data and


mount to the running instances of EC2.
 Storage 1 GB to 1TB as EC2 instances.
SimpleDB

Simplified data model based on the relational


database-SQL.
Structured data from users must be organized into
domains.
 Each domain can be considered a table.
The items are the rows in the table.
SimpleDB
A cell in the table: Column name of the corresponding
row.
 Can assign multiple values to a single cell in the
table.
This is not permitted in a traditional relational database
which wants to maintain data consistency.
Microsoft Azure Programming Support

 Azure Service Fabric is a Platform-as-a-Service (PaaS)


provided by Microsoft deploy micro services-based cloud
applications.
 Microservices enables an organization to deliver large, complex applications
rapidly, frequently, reliably and sustainably.
 Azure Service Fabric is a platform for distributed systems that makes
it easier to deploy, package, and manage dependable and
scalable micro services.
Microsoft Azure Programming Support

1. Azure fabric has virtualized hardware with dynamic


assignment of resources and fault tolerance.
2. XML template
Microsoft Azure Programming Support
3.When the system is running, services are monitored
and one can access event logs, trace/debug data,
performance counters, IIS(Internet Information
Service) web server logs, crash dumps, and other
log files.
4.This information can be saved in Azure storage.
• Azure application a customized compute VM : web role supporting basic Microsoft web
hosting. Appliances.
• The other important compute class is the worker role reflecting the importance in cloud
computing of a pool of compute resources that are scheduled as needed.

The roles support HTTP(S) and TCP. Roles offer the following methods
 OnStart() Fabric on startup, and allows to perform
initialization tasks.
 OnStop() role is to be shut down and gives a
graceful exit.
 Run() method which contains the main logic
SQL Azure
SQL Server as a service .
 REST interfaces used.
 The REST interfaces are automatically associated with
URLs and all storage is replicated three times for fault
tolerance and is guaranteed to be consistent in access.
SQL Azure

Storage is similar to S3 for Amazon.


Blobs are arranged as a three-level hierarchy
 Account → Containers → Page or Block Blobs.
Containers are similar to directories in
traditional file systems as a root.
Google
SQL Azure
Block blob  streaming data,4MB each, 64 byte ID.
Block blobs max 200GB in size.
SQL Azure

Page blobs  random read/write access an array of


pages maximum size 1TB.
Metadata of blobs <name, value> pairs 8 KB per
blob.
Azure Tables
Table and Queues are smaller in data.
 Queues provide reliable message web and worker roles.
 Queue messages retrieved and processed at least once with an 8 KB
limit on message size.
PUT, GET, and DELETE--message operations
 CREATE and DELETE for queues.
Azure Tables
rows - entities
columns -- properties.
All entities have 255 general properties
<name, type, value> triples.
 Entity has two extra properties, PartitionKey
and RowKey.
Azure Tables

RowKey-each entity a unique label


PartitionKey shared and entities with the
same PartitionKey are stored next to each other;
Good use of PartitionKey  speed up search
performance.
An entity - 1 MB storage;
EMERGING CLOUD SOFTWARE ENVIRONMENTS
1.Open Source Eucalyptus and Nimbus
Cloud computing-academic supercomputers
and clusters.
AWS-compliant EC2-web service interface for
interacting with the cloud service.
Eucalyptus provides services-AWS-compliant
Walrus, and a user interface for managing users and
images.
Walrus ”WS3” similar to S3
1 Eucalyptus Architecture
1.Open software environment(Private and Hybrid cloud).
2.The system supports cloud programmers in VM image management.
3.Cloud Controller (CLC)
4.The Cluster Controller (CC) in C and acts as the front end for a cluster within
a Eucalyptus cloud and communicates with the Storage Controller and Node
Controller.
5.Storage Controller (SC) in Java and equivalent to AWS EBS.
6.It communicates with the Cluster Controller and Node Controller
and manages Eucalyptus block volumes and snapshots to the
instances within its specific cluster.
Google
VM Image Management

Eucalyptus queues are similar to Amazon’s EC2,


Any user can upload and then register the VM
image.
Image is uploaded into a user-defined bucket
within Walrus retrieved anytime from any
availability zone.
 commercial proprietary and open source version
2 Nimbus

1. open source tools -- IaaS.


2. Client can lease remote resources by deploying VMs on
those resources and configuring them to represent the
environment desired by the user.
3. Special web interface Nimbus Web
administrative and user functions in a friendly interface.
4. Nimbus Web is Python Django .
1.Nimbus supports two resource management strategies.
2.The first is the default “resource pool” mode.
 Service has direct control of a pool of VM manager nodes and it
assumes it can start VMs.
3.Second mode is called “pilot.”
 Service makes requests to a cluster’s Local Resource
Management System (LRMS) to get a VM manager available to
deploy VMs.
8.Nimbus supports Amazon’s EC2
Nimbus
1.storage cloud :Cumulus can be integrated with the
other central services, can also be used stand-alone.
2.Cumulus is compatible with Amazon S3 REST API.
3. Nimbus cloud client uses the Java Jets3t library to
interact with Cumulus.
OpenNebula, Sector/Sphere, and OpenStack

OpenNebula
1. OpenNebula is open source toolkit for IaaS cloud.
2. Flexible and modular to allow integration with different storage
and network infrastructure configurations, and hypervisor
technologies.
3. Core :VM full life cycle, including setting/managing/deployment or
on-the-fly software environment creation.
4.Capacity manager or scheduler. It governs the
functionality provided by the core.
5.The default capacity scheduler is a requirement/rank
matchmaker.
6.The last main components are the access drivers.
7.They provide an abstraction of the underlying
infrastructure to monitoring/storage/virtualization
services.
8.libvirt API : VM management and command-line
interface (CLI).
10.Supports live migration and VM
snapshots( preserves the state and data of a virtual machine at
a specific point in time).
11.Insufficient local resources -use hybrid cloud
model.
Sector/Sphere

software platform simple/large distributed data


storage over large clusters of single/ multiple data
centers.
 Sector distributed file system and the
Sphere parallel data processing framework.
 The fault tolerance is implemented by replicating data in
the file system and managing the replicas.
 Sector know about network topology to better
reliability, availability, and access throughout.
User Datagram Protocol: Message passing
User-defined type (UDT): data transfer.
 UDT is a reliable, UDP-based application-level data
transport protocol enable high-speed data transfer over wide
area high-speed networks.
Google
Sector/Sphere Architecture

 Security server- Authenticating master servers, slave nodes,


and users.
 Master servers-infrastructure core,Metadata, schedules
jobs, and responds to users’ requests.
 Sector  Multiple active masters that can join and leave at
runtime and can manage the requests.
Sector/Sphere Architecture

 Slave nodes--Data is stored and processed.


 The slave nodes can be located within a single/multiple
data high-speed network connections.
 Client component- Tools and programming APIs for
accessing and processing Sector data.
Sphere
parallel data processing engine managed by Sector.
Coupling allows accurate decisions job scheduling and
data location.
 Sphere - programming framework to process data
stored in Sector.
UDF (user defined function) to run input data in parallel.
Failed data segments may be restarted on other
nodes to achieve fault tolerance.
 Sphere application, both inputs and outputs are Sector
files.
OpenStack
 Technologists/developers/researchers/industry to share
resources and technologies with massively scalable and
secure cloud infrastructure.
 Two aspects  Compute and storage aspects.
 Compute - Cloud creating and managing large
groups of virtual private servers”
Object Storage - redundant, scalable object storage to
store TB or PB of data.
OpenStack Compute

1. Cloud computing fabric controller- IaaS system-


Nova.
2. Nova is built with shared-nothing and messaging-
based information exchange.
3.Communication by message queues.
OpenStack Compute

4.Shared-nothing paradigm-
system state is kept in a distributed data system.
State updates through atomic transactions.
5. Nova is implemented in Python with libraries and
components.
6.boto, Amazon API in Python a fast HTTP server to
implement the S3 in OpenStack.
ATA-simple, high-performance access of block storage
Lightweight directory access protocol (LDAP) devices
Nova manage private networks, public
IP addressing, virtual private
network (VPN) connectivity, and firewall rules.

NetworkController manages address and virtual LAN (VLAN)


allocations.
RoutingNode governs the NAT conversion from public IPs to
private IPs, and enforces firewall rules.
AddressingNode runs Dynamic Host Configuration Protocol
(DHCP) services for private Networks.
TunnelingNode provides VPN connectivity
The network state consists of the following:
 VLAN assignment to a project
 Private subnet :to a security group in a VLAN
 Private IP assignments to running instances
 Public IP allocations to a project
 Public IP associations to a running instance
OpenStack Storage
Proxy server/ring/object server/container
server/account-server/replication.
 Proxy server lookups to the accounts, containers or
objects in OpenStack storage rings and route the
requests.
 Any object is streamed directly through the proxy server
to the user.
Google
OpenStack Storage

 A ring is a mapping between entity names and


their locations.
 Separate rings for accounts, containers, and objects
exist.
 A ring has zones/devices/partitions/replicas.
Manjrasoft Aneka Cloud and Appliances

1. Melbourne, Australia.
2. Rapid development and deployment :private or
public clouds.
3. Rich APIs for applications.
4.Tools to monitor and control the deployed
infrastructure.
Aneka applications Linux and Microsoft .NET framework environments.

Advantages of Aneka

•Multiple programming environments


• (QoS/SLA) - to multiple virtual and/or physical machines.
Aneka offers three types of capabilities building, accelerating, and
managing clouds
1. Build
a new SDK to rapidly develop applications.
Supports private/public/hybrid clouds.
2. Accelerate
Rapid deployment in Windows/Linux/UNIX.
 Use physical machines achieve maximum utilization .
insufficient resources use dynamic leasing from EC2.
3. Manage

GUI and APIs to set up, monitor, manage, and maintain remote
and global Aneka compute clouds.
Accounting and Scaling on SLA/QoS enables dynamic
provisioning.
Three important programming models supported by Aneka

1.Thread programming model.


2.Task programming model
3. MapReduce

Tasks are more lightweight than Threads.


Tasks use fewer system resources, such as memory and CPU time, compared to Threads.
Tasks are easier to manage than Threads.
Three services
1. Fabric Services:
 HA and improved reliability,
 node membership and directory
 resource provisioning, performance monitoring, and
hardware profiling.
Three services
2.Foundation Services:
Application execution and Infrastructure use to administrators and
developers.
Storage management/resource reservation/reporting/accounting/
billing/services monitoring and licensing.
3. Application Services:

Execution of applications and appropriate runtime


environment.
Application execution scalability/data
transfer/performance monitoring/Accounting and
billing.
Google
Virtual Appliances in Aneka

VMs and P2P network virtualization


technologies integrated to enable simple
deployment
Virtual appliances:VM images installation and configuration
with software stack (OS, libraries, binaries, configuration files, and auto-
configuration scripts) required when the virtual appliance is
instantiated.
Google

You might also like