Cloud Computing Disaster Recovery: Infrastructure Technician

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 26

Cloud Computing

Disaster Recovery
Infrastructure Technician
Sections
2.3 Describe the purpose of a disaster recovery plan
2.4 Identify where a disaster recovery plan can be found
2.5 Describe an infrastructure technician's role within a disaster recovery plan.
2.6 List the typical items that should be contained within a disaster recovery
plan
2.7 Explain when a disaster recovery plan can be tested
2.8 Explain how disaster recovery plan can be tested
2.9 Explain how to implement recovery following the steps outlined in the
disaster recovery plan
Purpose
Review your organisation’s plan and determine the purpose
Purpose
• To minimize interruptions to the normal operations.
• To limit the extent of disruption and damage.
• To minimize the economic impact of the interruption
• To establish alternative means of operation in advance
• To train personnel with emergency procedures
• To provide for smooth and rapid restoration of service
Minimise downtime and data loss
• Minimising downtime and data loss is measured in terms of two concepts:

• Recovery time objective (RTO)


• the time within which a business process must be restored, after a major incident (MI) has occurred
• Recovery point objective (RPO)
• the age of files that must be recovered from backup storage for normal operations to resume if a computer, system, or
network goes down as a result of a major incident
RTO & RPO

• RTO the time for:


• trying to fix the problem without
a recovery
• the recovery itself
• testing
• communication to the users

• RPO
• a time limit to work to
• eg: if RPO = 4 four hours,
then off-site mirrored
backups must be
continuously maintained – a
daily off-site backup on tape
will not suffice
RTO and RPO
• These should be set for each business process, as they will have different
values
• Eg:
• Sales - CRM
• HR
• Payroll
• Stock control
• Orders
• Web site
Purpose
• Review your organisation’s plan and see if it has defined:

• RTO

• RPO

• Determine what the backup strategy should be


Major incidents
• Natural
• Fire
• Flood
• Heat wave
• Blizzard
• Lightning

• Man-made
• Terrorism
• Fire
• Power failure
• Cyberattack
• Failed system change
Steps to be taken
• Depends on the nature of the major incident
• What are the possible outcomes from a major incident?
• Eg Unable to access building
Outcomes from a MI
• Loss of access to clients
• Loss of access to servers
• Loss of access to cloud
• Loss of access to buildings
• Loss of key personnel

• Check your disaster plan for the steps taken for each of the above
Contents of a disaster recovery plan
• Purpose
• Risk assessment and business impact analysis
• Business process priorities
• Business continuity and recovery strategy
• Roles and responsibilities
• Key data
• Test plan
Risk assessment
• What can go wrong?
• How severe is it?
• How likely is it to occur?

• How can we calculate it?


• Assess the impact as: Low (1), Medium (2), High (3)
• Assess the probability as: Low (1), Medium (2), High (3)
• Multiply the two together
• 1 = low risk, 9 = high risk
Risk assessment
• Risk assess each of the following:
• Fire
• Flood
• Heat wave
• Blizzard
• Lightning
• Terrorism
• Fire
• Power failure
• Cyberattack
• Failed system change
Business impact analysis
• Determine which area of the business is critical (urgent) and which areas are
non-critical (non-urgent)
• Critical functions are those whose disruption is regarded as unacceptable
• Consider:
• Financial (Amazon sales per day $5 million)
• Reputation (TSB system migration)
• Customer/user retention

• Is there an impact analysis in your plan?


Business process priorities
• No organization possesses infinite resources
• Criteria must be set as to where to allocate resources first
• How long can a business operate without a critical system?
• Analysis should result in an RPO for each business area
• Resources should be allocated to the systems required to meet the RPOs for
each area

• Does your plan have priorities?


Business continuity and recovery strategy
• What resources are required to re-establish operations?
• Physical facilities
• Computer hardware
• Computer software
• Communications links
• Data files and databases
• Customer services (telephony)
• User operations
• Management information systems (MIS)
Facilities
• Hot site
• A duplicate site
• Full systems and up to date files (mirrored, synchronised)
• Shortest switchover
• Most expensive
• Warm site
• Hardware and connectivity in place
• Requires backup recovery
• Facilities may be shared (may not be available if MI affects many businesses
• Cold site
• Space, power, communication lines
• Requires systems and backups to become operational
• Cheapest
Switching to a hot site
• Software available to do this automatically
• But if not:
• Stop services at the failing site
• Switch the DNS records to point to the backup site
• Start services on the backup site
• Test users connect to the backup site and can access services and data
Recovery strategies
• Restore facilities
• Restore communications
• Restore applications, systems, data
• Replace equipment
• Replace personnel
Roles and responsibilities
• Management team
• Co-ordinates the recovery process
• Assesses the disaster
• Activates the recovery plan
• Contacts team managers/leaders/key players
• Monitors the recovery process

• Team managers/leaders/key players


• Defined responsibilities for each recovery strategy
(Infrastructure Technicians should know their role: recovering systems as instructed by
manager/team leader))

• Does your plan have defined roles and responsibilities?


Key data
• Master call list (who to inform)
• Critical phone numbers
• Notification check list
• Inventories (equipment, hardware, software, telecoms)
• Standby locations
• Backup/retention schedules and locations
• Location and holders of the plan (key people should hold hard copy)

• Does your plan have this data?


Test plan
• Test at least annually, but assess against risk/impact
(Eg backups should be tested weekly/monthly)

• Determine the feasibility and compatibility of backup facilities and procedures


• Identify areas in the plan that need modification
• Provide training to the team managers and team members
• Demonstrate the ability of the organisation to recover
• Provide motivation for maintaining and updating the disaster recovery plan
Test methods
• Paper test (cheapest)
• Individuals read and review the plan
• Establish a checklist (eg Fuel available for backup generator?)
• Walkthrough test
• Step through the plan with all key members present
• Simulation
• Scenario based (in addition to an MI add minor incidents (eg no fuel for gerneator)
• Parallel test
• Backup and restore a parallel to system to the production system
• Full interruption (most expensive)
• Kill and restore the production system
Technical items to test
• Restore data and file
• Restore a complete system
• Failover and failback (switch to standby and back again)
• Backup power (generators/UPS)
• Hot or warm site
• Stress test backup servers
Exercise
• Take a disaster recovery plan
• In a group perform a simulation
• You will be given a major incident
• There may be minor incidents

• Document any required improvements

You might also like