PROD - DR Plan - Overview
PROD - DR Plan - Overview
PROD - DR Plan - Overview
PROD - DR Plan
Last updated by | Shreyans Jain | Mar 26, 2024 at 5:28 PM GMT+5:30
Contents
• General
• Global Resources
• Azure KeyVault
• Azure App Configuration
• Azure Cosmos Mongo DB
• Backup
• Azure Storage account
• Backup
• Region Specific Services
• Azure SQL Server
• DR Approach 1
• DR Approach 2
• DB Backup
• Failover groups
• HaProxy
• Azure Kubernetes Service (AKS)
• Azure Function App
• Azure Service Bus
• Azure Storage Account
• DR Execution Plan
• Approach 1
• Critical Resources:
• Other Resources:
• Approach 2
• References
General
Primary and Secondary Azure regions
UK UK South UK West
US East US West US
https://dev.azure.com/Aptean/Paragon/_wiki/wikis/Paragon.wiki/4403/PROD-DR-Plan 1/6
4/11/24, 8:19 PM PROD - DR Plan - Overview
Plan #1
We shall go with Hybrid way of DR plan, i.e. some of the resources will be created along with Prod infra
creation and the rest of the resources/ services will be created during the DR process.
Below mentioned Resources will be created before DR:
Vnet/ Subnet rules
HaProxy
HaProxy DR image in Prod Global image gallery (one time activity)
AKS
SQL server on the secondary region with 50% processing power.
Plan #2
Global Resources
Azure Cosmos Mongo DB
Azure KeyVault
Azure AppConfig
Azure Storage account
Azure KeyVault
Geo-Replication not available.
References
https://learn.microsoft.com/en-us/azure/key-vault/general/disaster-recovery-guidance
https://learn.microsoft.com/en-us/azure/reliability/cross-region-replication-azure#azure-paired-regions
- As there is no replication by Azure for this service, and all the KeyVault values are available in
- For now, we are thinking of recreating the KeyVault, upload the secrets, map with AppConfig and res
- What is the plan to update the secondary region's connection string.
Once above is done, recreating the KeyVault, upload the secrets, map with AppConfig and restart pods.
https://dev.azure.com/Aptean/Paragon/_wiki/wikis/Paragon.wiki/4403/PROD-DR-Plan 2/6
4/11/24, 8:19 PM PROD - DR Plan - Overview
Backup
Backup policy mode : Periodic
Geo-redundant backup storage.
Default consistency : Session
DB Backup Interval - 12Hrs.
DB Retention policy - 30days.
Backup
Enable Azure Backup for blobs for 30days.?
HaProxy VMSS
AKS
Load balancer
SQL Server
SQL Database
Service Bus
Event Grid
SignalR
Region specific cluster - Upon failure, traffic to nearest cluster or re-run the terraform for same/ new
region
https://dev.azure.com/Aptean/Paragon/_wiki/wikis/Paragon.wiki/4403/PROD-DR-Plan 3/6
4/11/24, 8:19 PM PROD - DR Plan - Overview
DR Approach 1
The suggestion from DBA (current practice) is to use the 'Failover group' option for the SQL server.
Prod and DR SQL DBs will be active <-> active.
DBA team take responsibility of manually switch the DB pointing from primary --> secondary region.
Secondary region's processing power will be 50% of Primary region's processing power, this can be
bumped up on need basis by DBA team.
DR Approach 2
Manual Restoring the Backed-up DBs individually.
DB Backup
Geo-redundant backup storage.
DB Backup Interval - 12Hrs. ?
DB Retention policy - 30days. ?
Failover groups
Read/Write failover policy : Customer managed
Read/Write grace period : 1hr
Reference : https://learn.microsoft.com/en-us/azure/azure-sql/database/failover-group-sql-db?
view=azuresql
HaProxy
Separate HaProxy for each deployment region.
Approach 1
DR HaProxy will be created beforehand during Prod infra creation to avoid more outage in re-creating
the HaProxy. (Hybrid Approach)
Approach 2
DR HaProxy to be created during the DR process (approx. time to create 30- 45mins)
https://dev.azure.com/Aptean/Paragon/_wiki/wikis/Paragon.wiki/4403/PROD-DR-Plan 4/6
4/11/24, 8:19 PM PROD - DR Plan - Overview
Approach 1
DR AKS will be created beforehand during Prod infra creation to avoid more outage in re-creating the
HaProxy. (Hybrid Approach)
Approach 2
DR Execution Plan
Approach 1
Critical Resources:
Critical/Important resources will be pre-created in respective DR Regions(UK-South --> UK West, East US -->
West US) with relevant SKU and networking rules.
Ha-Proxy: Ha-Proxy image for DR region will be pre-created and stored in prd-shr-image gallery. Ha-
Proxy with VNet rules and SKU will be pre-deployed in DR location.
Azure Kubernetes Service (AKS): AKS will be pre-deployed with networking rules and relevant SKU in
DR location.
SQL Server: will be pre-deployed with networking rules and relevant SKU in DR location.
Other Resources:
Other DR resources will be rolled up using terraform and they resources will be associated with respective
networking rules and SKU.
Approach 2
Create all Azure resources from scratch and configure.
Approx ETA - 8hrs.
References
https://dev.azure.com/Aptean/Paragon/_wiki/wikis/Paragon.wiki/4403/PROD-DR-Plan 5/6
4/11/24, 8:19 PM PROD - DR Plan - Overview
https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-
Azure SLA
for-Online-Services?lang=1&year=2024
https://learn.microsoft.com/en-us/azure/key-vault/general/disaster-recovery-
KeyVault
guidance
Azure Storage
https://learn.microsoft.com/en-us/azure/storage/common/storage-redundancy
redundancy
https://dev.azure.com/Aptean/Paragon/_wiki/wikis/Paragon.wiki/4403/PROD-DR-Plan 6/6