07 - Module 7
07 - Module 7
07 - Module 7
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 1
Cloud Computing Reference Model
Business Continuity Cross-layer Function
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 2
Lesson: Business Continuity Overview
This lesson covers the following topics:
• Business continuity
• Cloud service availability
• Causes of service unavailability
• Impact of cloud service unavailability
• Key methods to achieve the required cloud service
availability
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 3
What is Business Continuity?
Business Continuity
BC entails preparing for, responding to, and recovering from service outage that
adversely affects business operations.
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 4
Cloud Service Availability
Cloud Service Availability
Refers to the ability of a cloud service to perform its agreed function according to
business requirements and customer expectations during its specified time of
operation.
• Service availability is based on the agreed service time and the
downtime
(Agreed service time is the period where the service is supposed to be available)
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 5
Causes of Cloud Service Unavailability
• Application failure
– For example, due to catastrophic exceptions caused
by bad logic
• Data loss
• Infrastructure component failure
• Failure of dependent services
• Data center or site down
• Refreshing IT infrastructure
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 7
Impact of Cloud Service Unavailability
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 8
Methods to Achieve Required Cloud Service
Availability
• Building resilient cloud infrastructure facilitates meeting the
required service availability
• Building resilient cloud infrastructure requires various high
availability solutions
– Implementing fault tolerance mechanisms
• Deploying redundancy at both cloud infrastructure component level and
site level to avoid single point of failure
– Deploying data protection solutions such as backup and replication
– Implementing automated cloud service failover
– Architecting resilient cloud applications
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 9
Lesson Summary
During this lesson the following topics were covered:
• Business continuity
• Cloud service availability
• Causes of service unavailability
• Impact of cloud service unavailability
• Methods to achieve the required cloud service availability
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 10
Lesson: Building Fault Tolerance Cloud
Infrastructure – 1
This lesson covers the following topics:
• Avoiding single points of failure
• Key fault tolerance mechanisms
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 11
Single Points of Failure
Single Points of Failure
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 12
Avoiding Single Points of Failure
• Single points of failure can be avoided by implementing fault
tolerance mechanisms such as redundancy
– Implement redundancy at component level
• Compute
• Storage
• Network
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 13
Implementing Redundancy at Component
Level Key techniques to protect compute
Clustering
VM live migration
Key techniques to protect network connectivity
Link and switch aggregation
NIC teaming
Multipathing
In-service software upgrade
Configuring redundant hot swappable components
Key techniques to protect storage
RAID and erasure coding
Dynamic disk sparing
Configuring redundant storage system components
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 14
Compute Clustering
Compute Clustering
A technique where at least two compute systems (or nodes) work together and are
viewed as a single compute system to provide high availability and load balancing.
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 15
Hypervisor Cluster
• Multiple hypervisors running on different systems are clustered
• Provides continuous availability of services running on VMs
even if a physical compute system or a hypervisor fails
– Typically a live instance (i.e., a secondary VM) of a primary VM is
created on another compute system
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 16
Virtual Machine Live Migration
• Running services on VMs are moved from one physical compute
system to another without any downtime
– Allows scheduled maintenance without any downtime
– Facilitates VM load balancing
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 17
Link and Switch Aggregation
• Link aggregation
– Combines links between two switches and
also between a switch and a node
– Enables network traffic failover in the event
of a link failure in the aggregation
– Enables distribution of network traffic across
links in the aggregation
• Switch aggregation
– Provides fault tolerance against switch and
link failures
– Improves node performance by providing
more active paths and bandwidth
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 18
NIC Teaming
NIC Teaming
A link aggregation technique that groups NICs so that they appear as a single, logical
NIC to the OS or hypervisor.
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 19
Multipathing
• Enables a compute system to use multiple
paths for transferring data to a LUN
• Enables failover by redirecting I/O from a
failed path to another active path
• Performs load balancing by distributing I/O
across active paths
– Standby paths become active if one or more
active paths fail
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 20
In-Service Software Upgrade (ISSU)
• Allows updating software on network devices (switches and
routers) without impacting the network availability
– Eliminates the need to stop the ongoing process on a device
– Ensures network availability as a result of a network device
maintenance or upgrade processes
• Typically requires a network device with redundant control
plane elements (supervisor or routing engines)
– This setup allows the administrator to update the software image
on one engine while the other maintains network availability
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 21
RAID and Dynamic Disk Sparing
• RAID
– Combines multiple drives into a logical A1 A2 AP AQ
failure
RAID 6 -Dual Distributed Parity
• Dynamic disk sparing
– Automatically replaces a failed drive
with a spare drive to protect against
data loss
– Multiple spare drives can be configured
to improve availability
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 22
Erasure Coding
• Provides space-optimal data redundancy to protect data loss
against multiple drive failure
– A set of n disks is divided into m disks to hold data and k disks to
hold coding information
– Coding information is calculated from data
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 23
Storage Resiliency Using Mirrored LUN
• Mirrored LUN is created using
virtualization appliance
– Each I/O to the LUN is mirrored to the
LUNs on the storage systems
– Mirrored LUN is continuously available
to the compute system
• Even if one of the storage systems is
unavailable due to failure
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 24
Lesson Summary
During this lesson the following topics were covered:
• Single points of failure
• Clustering and VM live migration
• Aggregation and multipathing
• In-service software upgrade
• RAID, erasure coding, and dynamic drive sparing
• Storage resiliency using mirrored LUN
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 25
Lesson: Building Fault Tolerance Cloud
Infrastructure – 2
This lesson covers the following topics:
• Service availability zone
• Automated service failover across zones
• Active/passive and active/active zone configurations
• Live migrations across zones using stretched cluster
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 26
Service Availability Zones
• A service availability zone is a location with its own set of
resources and isolated from other zones
– A zone can be a part of a data center or may even be comprised
of the whole data center
• Enables running multiple service instances within and across zones to
survive data center or site failure
• In the event of outage, the service should seamlessly failover across
the zones
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 27
Automated Service Failover Across Zones
• Automated service failover
– Ensures robust and consistent failover
– Enables to meet stringent service levels
• Reduces RTO
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 28
Active/Passive Zone Configuration
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 29
Active/Active Zone Configuration
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 30
VM Migration Across Zones Using Stretched
Cluster
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 31
Lesson Summary
During this lesson the following topics were covered:
• Service availability zones
• Active/passive and active/active zone configurations
• VM migration across zones using stretched cluster
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 33
Lesson: Data Protection Solution – Backup
This lesson covers the following topics:
• Backup and recovery
• Backup requirements in a cloud environment
• Guest-level and image-level backup method
• Backup as a Service
• Backup service deployment options
• Deduplication for backup environment
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 34
Data Protection Overview
• Protecting critical data ensures availability of services
– Seamless service failover requires the availability of data
• Businesses also implement data protection solutions in order to
comply with regulatory requirements
• Individual services and associated data sets have different
business values, require different data protection strategies
• Two common data protection solutions:
– Backup
– Replication
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 35
Introduction to Backup and Recovery
Backup
An additional copy of production data, created and retained for the sole purpose of
recovering lost or corrupted data.
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 36
Backup Requirements in a Cloud Environment
• Backup requires integration between backup application and
management server of virtualized environment
• Backup requirements may differ from one service to another
based on RTO and RPO
– Requires well-defined backup strategies to meet the requirements
• Recovery requires file level and/or full VM recovery
• Huge volume of redundant data in the backup environment
– Large number of VMs having identical data and configurations
• Backup and recovery operations need to be automated
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 37
Key Backup Components
• Backup client
– Gathers the data that is to be backed up
– Sends the data to the storage node
• Backup server
– Manages backup operations
– Maintains backup catalog
• Storage node
– Responsible for writing data to backup device
• Backup device (backup target)
– Tape library, disk library, and virtual tape library
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 38
Backup Targets
Backup Targets Description
Tape Library • Tapes are portable and can be used for long term offsite storage
• Must be stored in locations with a controlled environment
• Not optimized to recognize duplicate content
• Data integrity and recoverability are major issues with tape-based backup media
Disk Library • Enhanced backup and recovery performance
• Disks also offer faster recovery when compared to tapes
• No inherent off-site capability, and is dependent on additional technologies such as
replication to comply with off-site requirements
• Disk-based backup appliance includes features such as deduplication, compression,
encryption, and replication to support business objectives
Virtual Tape Library • Disks are emulated and presented as tapes to backup software
• Does not require any additional modules or changes in the legacy backup software
• Provides better performance and reliability over physical tape
• Does not require the usual maintenance tasks associated with a physical tape drive,
such as periodic cleaning and drive calibration
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 39
Backup Methods
• Two key backup methods:
– Guest-level
– Image-level
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 40
Guest-level Backup
• Backup agent is installed on each VM
– Performs file-level backup and recovery
– Does not backup VM configuration files
• Performing backup on multiple VMs on a compute system may
consume more resources and lead to resource contention
– Impacts performance of applications running on VMs
A A
A = Backup Agent
VM Snapshot
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 42
Backup as a Service
• Enables consumers to procure backup services on demand
– Provides offsite backup for consumer desktops, laptops, and
application servers
– Backs up data to the cloud storage
• Reduces the backup management overhead
– Transformation from CAPEX to OPEX
– Pay-per-use/subscription-based pricing
• Gives consumers the flexibility to select a backup technology
based on their current requirements
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 43
Backup Service Deployment Options
Local - Suitable when a cloud service provider already hosts consumer applications and data
Backup Service
- Backup service is offered by the provider to protect consumer’s data
- Backup is managed by the service provider
Replicated - Service provider only manages data replication and IT infrastructure at disaster
Backup recovery site
Service
- Local backups are managed by consumers
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 44
Drivers for Optimizing Backup
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 45
Introduction to Data Deduplication
Data Deduplication
The process of detecting and identifying the unique data segments within a given set
of data to eliminate redundancy.
• Deduplication process
– Chunk the data set
Deduplication
– Identify duplicate chunk
– Eliminate the redundant chunk
After Deduplication
Unique segments = 3
Before Deduplication
Total segments = 39
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 46
Deduplication Granularity Level
• File-level deduplication
– Detects and removes redundant copies of identical files
– Only one copy of the file is stored; the subsequent copies are
replaced with a pointer to the original file
• Does not address the problem of duplicate content inside the files
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 47
Deduplication Method
• Source-based deduplication
– Eliminates redundant data at the source (backup client)
– Client sends only new, unique segments across the network
– Reduces storage and network bandwidth requirements
– Increases overhead on the backup client
• Target-based deduplication
– Offloads deduplication process from the backup client
– Data is deduplicated at the target either inline or post-process
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 48
Lesson Summary
During this lesson the following topics were covered:
• Backup requirements in a cloud environment
• Guest-level and image-level backup methods
• Backup as a Service
• Backup service deployment options
• Source-based and target-based deduplication
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 50
Lesson: Data Protection Solution-Replication
This lesson covers the following topics:
• Replication and its types
• Snapshot and mirroring
• Synchronous and asynchronous remote replication
• Continuous Data Protection (CDP)
• Disaster Recovery as a Service (DRaaS)
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 51
Introduction to Replication
Replication
Process of creating an exact copy (replica) of the data for ensuring availability of
services.
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 52
Local Replication: Snapshot
• A virtual copy of a set of files, or volume as they appeared in a
particular PIT
– Provides the ability to restore the files or volumes if there is a data
loss or corruption
• Virtual machine snapshot is a common snapshot technique,
that preserves the state and data of a VM at a specific PIT
– When a snapshot is created, a child virtual disk (delta disk file) is
created from the base image or parent virtual disk
– Successive snapshots generate a new child virtual disk from the
previous child virtual disk
– Snapshots hold only changed blocks
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 53
Local Replication: Mirroring
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 54
Remote Replication: Synchronous
• Write is committed to both the source and the remote replica before it
is acknowledged to the compute system
• Ensures that the source and the replica have identical data at all times
– Provides near zero RPO
2
Storage Storage
(Source) (Replica)
3
Primary Zone Secondary Zone
(Source site) (Remote Site)
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 55
Remote Replication: Asynchronous
• A write is committed to the source and immediately
acknowledged to the compute system
– Data is buffered at the source and transmitted to the remote site
later
– Replica will be behind the source by a finite amount (finite RPO)
3
Storage Storage
(Source) (Replica)
4
Primary Zone (Source Site) Secondary Zone (Remote Site)
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 56
Advanced Replication Solution: CDP
• Provides the ability to restore data to any previous PIT
– Enables to meet the required recovery level for an application
• Data changes are continuously captured and stored in a
separate location from the production data
• Supports both local and remote replication
– To meet operational and disaster recovery respectively
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 57
Key CDP Components
• Journal volume
– Contains all the data that has changed from the time the
replication session started to the production volume
• Journal size determines how far back in time the recovery points can go
• CDP appliance
– Intelligent hardware platform that runs the CDP software
• Manages both the local and the remote replications
• Write splitter
– Intercept writes to the production volume from the compute
system and splits each write into two copies
• Can be implemented at the compute, fabric, or storage system
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 58
CDP Operations: Local and Remote Replication
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 59
Replication Use Case: DRaaS
• Service provider offers resources to enable consumers to run
their IT services in the event of a disaster
– Resources at the service provider location can be dedicated to the
consumer or they can be shared
• Replication is a key technique used by the service provider in
order to offer DRaaS to the consumers
• Service provider should design, implement, and document a
DRaaS solution specific to the customer’s infrastructure
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 60
DRaaS – Normal Production Operation
• IT services run at the consumer’s production data center
• Replication occurs from the consumer production environment
to the service provider’s data center over the network
– Data is usually encrypted while replicating to the provider’s
location VM instances are
not allocated
Network
Storage
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 61
DRaaS – Business Disruption
• Business operations failover to the provider’s infrastructure in
the event of a disaster at consumer’s data center
– Users at the consumer organization are redirected to the cloud
• Typically VM instances are created from a pool of compute
– Connect replicated storage to each of the newly activated VMs
VM instances are
invoked to run
the service
Disaster
Network
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 62
Lesson Summary
During this lesson the following topics were covered:
• Snapshot and mirroring
• Synchronous and asynchronous remote replication
• Continuous Data Protection
• Disaster Recovery as a Service
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 63
Lesson: Application Resiliency for Cloud
This lesson covers the following topics:
• Resilient cloud application
• Key design strategies for application resiliency
• Monitoring applications for availability
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 64
Resilient Cloud Applications Overview
• Cloud applications have to be designed to deal with IT
resources failure to guarantee the required availability
• Fault resilient applications have logic to detect and handle
transient fault conditions to avoid application downtime
• Key application design strategies for improving availability
– Graceful degradation of application functionality
– Retry logic in application code
– Persistent application state model
– Event-driven processing
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 65
Graceful Degradation
• Application maintains the limited functionality even when some
of the modules or supporting services are not available
– Unavailability of certain application components or modules should
not bring down the entire application
• For example, an e-commerce site can continue to collect orders
even if its payment gateway is unavailable
– Provides the ability to process orders when the payment gateway
is once again available or after failing over to a secondary gateway
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 66
Fault Detection and Retry Logic
• Refers to a mechanism that implements a logic in the code of
an application to improve the availability
– To detect and retry the service that is temporarily down
• May result in successful restore of service
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 67
Persistent Application State Model and Event-
driven Processing
• Persistent application state model
– Application state information is stored out of the memory
• Stored in a data repository
– If an instance fails, the state information is still available in the
repository
• Asynchronous event-driven processing
– Applications are written in a way to process the user request from
a queue asynchronously instead of synchronous call
• Allows multiple applications instances to process requests
• If an instance is lost, the impact is minimal
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 68
Monitoring Application Availability
• Specialized tools provide the capability to monitor the
availability of application instance that runs on VMs
– Minimizes downtime associated with the application failure
– Typically this tool is integrated with VM management software
• When there is an error or failure in an application
– The tool attempts to restart the application within the VM
– If the application does not restart successfully, the tool
communicates to VM management software
• VM management software in turn automatically restart the VM
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 69
Lesson Summary
During this lesson the following topics were covered:
• Graceful degradation of application functionality
• Retry logic in application code
• Persistent application state model
• Event-driven processing
• Monitoring application availability
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 70
Module Summary
Key points covered in this module:
• Business continuity
• Cloud service availability
• Fault tolerance mechanisms for cloud infrastructure
• Backup and deduplication
• Local and remote replication
• Fault resilient cloud application design strategies
© Copyright 2014 EMC Corporation. All rights reserved. Module: Business Continuity 71