PROD - DR Plan - Overview

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

4/11/24, 8:19 PM PROD - DR Plan - Overview

PROD - DR Plan
Last updated by | Shreyans Jain | Mar 26, 2024 at 5:28 PM GMT+5:30

Contents
• General
• Global Resources
• Azure KeyVault
• Azure App Configuration
• Azure Cosmos Mongo DB
• Backup
• Azure Storage account
• Backup
• Region Specific Services
• Azure SQL Server
• DR Approach 1
• DR Approach 2
• DB Backup
• Failover groups
• HaProxy
• Azure Kubernetes Service (AKS)
• Azure Function App
• Azure Service Bus
• Azure Storage Account
• DR Execution Plan
• Approach 1
• Critical Resources:
• Other Resources:
• Approach 2
• References

General
Primary and Secondary Azure regions

Region Primary Secondary

Global UK South UK West

UK UK South UK West

US East US West US

https://dev.azure.com/Aptean/Paragon/_wiki/wikis/Paragon.wiki/4403/PROD-DR-Plan 1/6
4/11/24, 8:19 PM PROD - DR Plan - Overview

Plan #1

We shall go with Hybrid way of DR plan, i.e. some of the resources will be created along with Prod infra
creation and the rest of the resources/ services will be created during the DR process.
Below mentioned Resources will be created before DR:
Vnet/ Subnet rules
HaProxy
HaProxy DR image in Prod Global image gallery (one time activity)
AKS
SQL server on the secondary region with 50% processing power.

Plan #2

Get the DR Infra created from scratch.

Global Resources
Azure Cosmos Mongo DB
Azure KeyVault
Azure AppConfig
Azure Storage account

Azure KeyVault
Geo-Replication not available.

Keyvault will have in-built cross-region replication based on proximity.


Ex: UK West <--> UK South
East US <--> West US

References
https://learn.microsoft.com/en-us/azure/key-vault/general/disaster-recovery-guidance
https://learn.microsoft.com/en-us/azure/reliability/cross-region-replication-azure#azure-paired-regions

- As there is no replication by Azure for this service, and all the KeyVault values are available in
- For now, we are thinking of recreating the KeyVault, upload the secrets, map with AppConfig and res
- What is the plan to update the secondary region's connection string.
Once above is done, recreating the KeyVault, upload the secrets, map with AppConfig and restart pods.

Azure App Configuration


Geo-Replication available.

Replication : Read-access geo-redundant storage (RA-GRS)

- What is the plan to update the secondary region's connection string.

Azure Cosmos Mongo DB

https://dev.azure.com/Aptean/Paragon/_wiki/wikis/Paragon.wiki/4403/PROD-DR-Plan 2/6
4/11/24, 8:19 PM PROD - DR Plan - Overview

Cosmos DB model : Request Unit (RU)


Geo-Replication available.
Write Locations : UK South (primary)
Read Locations : UK West (secondary)
Default consistency : Session

Backup
Backup policy mode : Periodic
Geo-redundant backup storage.
Default consistency : Session
DB Backup Interval - 12Hrs.
DB Retention policy - 30days.

Azure Storage account


Geo-Replication available.
Storage account in Global region is Publicly accessible.
Redundancy : Read-access geo-zone-redundant storage (RA-GZRS)

Backup
Enable Azure Backup for blobs for 30days.?

Region Specific Services


Public IP address

HaProxy VMSS

AKS

Load balancer

SQL Elastic Pool

SQL Server
SQL Database
Service Bus

Event Grid

SignalR

Storage account (2 Nos.)

Function Apps (8 Nos.)

App Service Plan (3 Nos.)

Region specific cluster - Upon failure, traffic to nearest cluster or re-run the terraform for same/ new
region

https://dev.azure.com/Aptean/Paragon/_wiki/wikis/Paragon.wiki/4403/PROD-DR-Plan 3/6
4/11/24, 8:19 PM PROD - DR Plan - Overview

Azure SQL Server

DR Approach 1
The suggestion from DBA (current practice) is to use the 'Failover group' option for the SQL server.
Prod and DR SQL DBs will be active <-> active.
DBA team take responsibility of manually switch the DB pointing from primary --> secondary region.
Secondary region's processing power will be 50% of Primary region's processing power, this can be
bumped up on need basis by DBA team.

DR Approach 2
Manual Restoring the Backed-up DBs individually.

DB Backup
Geo-redundant backup storage.
DB Backup Interval - 12Hrs. ?
DB Retention policy - 30days. ?

Failover groups
Read/Write failover policy : Customer managed
Read/Write grace period : 1hr

Reference : https://learn.microsoft.com/en-us/azure/azure-sql/database/failover-group-sql-db?
view=azuresql

HaProxy
Separate HaProxy for each deployment region.

Approach 1

DR HaProxy will be created beforehand during Prod infra creation to avoid more outage in re-creating
the HaProxy. (Hybrid Approach)

Approach 2

DR HaProxy to be created during the DR process (approx. time to create 30- 45mins)

Azure Kubernetes Service (AKS)

https://dev.azure.com/Aptean/Paragon/_wiki/wikis/Paragon.wiki/4403/PROD-DR-Plan 4/6
4/11/24, 8:19 PM PROD - DR Plan - Overview

Separate AKS for each deployment region.

Approach 1

DR AKS will be created beforehand during Prod infra creation to avoid more outage in re-creating the
HaProxy. (Hybrid Approach)

Approach 2

DR AKS to be created during the DR process (approx. time to create 60 - 90mins).

Azure Function App


Geo Replication for Function App is unavailable - any failures, redirect to another region? - TBD

Azure Service Bus


Standard tier plan, No Geo-Replication approach available for this plan.

Azure Storage Account


Geo-Replication available.
Redundancy : Read-access geo-zone-redundant storage (RA-GZRS)

DR Execution Plan

Approach 1

Critical Resources:
Critical/Important resources will be pre-created in respective DR Regions(UK-South --> UK West, East US -->
West US) with relevant SKU and networking rules.

Ha-Proxy: Ha-Proxy image for DR region will be pre-created and stored in prd-shr-image gallery. Ha-
Proxy with VNet rules and SKU will be pre-deployed in DR location.
Azure Kubernetes Service (AKS): AKS will be pre-deployed with networking rules and relevant SKU in
DR location.
SQL Server: will be pre-deployed with networking rules and relevant SKU in DR location.

Other Resources:
Other DR resources will be rolled up using terraform and they resources will be associated with respective
networking rules and SKU.

Approach 2
Create all Azure resources from scratch and configure.
Approx ETA - 8hrs.

References

https://dev.azure.com/Aptean/Paragon/_wiki/wikis/Paragon.wiki/4403/PROD-DR-Plan 5/6
4/11/24, 8:19 PM PROD - DR Plan - Overview

Resource/ Service Reference

https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-
Azure SLA
for-Online-Services?lang=1&year=2024

https://learn.microsoft.com/en-us/azure/key-vault/general/disaster-recovery-
KeyVault
guidance

Azure storage DR https://learn.microsoft.com/en-us/azure/storage/common/storage-disaster-


planning and recovery-guidance https://learn.microsoft.com/en-
failover us/azure/storage/common/storage-failover-private-endpoints

Azure Storage
https://learn.microsoft.com/en-us/azure/storage/common/storage-redundancy
redundancy

Azure paired https://learn.microsoft.com/en-us/azure/reliability/cross-region-replication-


regions azure#azure-paired-regions

https://dev.azure.com/Aptean/Paragon/_wiki/wikis/Paragon.wiki/4403/PROD-DR-Plan 6/6

You might also like