Azure Site Recovery with Terraform and Azure DevOps
Azure Site Recovery - Intro
Azure Site Recovery (ASR) is a service within Azure Native aimed at providing cost-efficient and easy to configure disaster recovery solution for Azure virtual machines. It is one of the important aspects of getting to know and mastering Azure business continuity services. Configuration for ASR can be seen as easy to slightly challenging depending on the requirements that the business has for protection and security of data as well the requirements towards isolation of services within different subscriptions.
On a larger scale deployment of Azure Native resources, which are repeatable, should be considered with an automatic deployment to avoid unnecessary errors, delays and provide quick recovery of structures (at least) if a catastrophic loss happens. One of the preferred tools for deployment automation is Terraform with an integration into Azure Native DevOps pipelines.
Main Aspects of Azure Site Recovery
The main components that you have to be concerned with when working with ASR are:
– The recovery vault – this is the service that holds all automation and metadata for the replication including replication policies, etc.
– The cache storage account – the Blob storage account used for gathering all replication snapshots performed by the ASR service
– Replication fabrics, protection containers and replication policies – which we put in a single point as they are not that specific to plan and implement but are integral part of the solution
Obviously, you do not integrate that type of service without thinking about design of all aspects: networking, security, deployments, resourcing, continuity.
So, the first steps in deploying ASR for a virtual machine is to think of the infrastructure supporting that configuration.
– Are you deploying a separate VNET for your Recovery Vault?
– Are you using private endpoints to connect that vault to your virtual machines?
– How are you going to secure your communication? Are you deploying NSGs?
– Are you using private peering between the Recovery Vault VNET and your application VNETs?
– Do you use a single cache storage within the subscription or one dedicated for each environment?
– Do you need the cache storage account to be in a different subscription than the servers it protects
There are a number of important questions that will require you to plan carefully before you start any deployments with Terraform.
Terraform Deployments
We will not go into detail about how to introduce a basic infrastructure integrating Terraform Azure workers with Azure DevOps pipelines, but we will point out a few main things that you will have to have in order to start:
– Azure DevOps account
– Storage Vault to hold all sensitive information related to the deployment
– Storage Account to hold Terraform state
– Working with 2 pipelines at the same time for the moment is free of charge, but that could change in the future
Before you start deploying make sure you break down critical parts of your code into separate folders and pipelines, so changes within your Terraform code do not affect all parts but just the one in focus.
Other important approach is to separate the more general parts of your code into separate files that are in place for all configurations and have just one file or module, which relates to the configuration of ASR for a VM or a group of similar VMs (like used by the same application). For instance:
– Define your ASR VNET, subnets, NSGs within one file as this will be valid for every implementation
– Put there the replication fabrics deployment code as you will a single replication fabric per region for all your ASR implementations
– Within a bit more specific module you can put the deployment of the cache storage and the replication policies as they will most probably be valid on a group level – let’s say when you deploy everything for all your DB servers (for instance)
– Final layer is your most specific file, which will be for the VM / VMs itself. So define here:
o Source and target application VNETs – or just describe them if you are not deploying them now
o Disk types – target and replication
o Private endpoints for the cache storage
o Private DNS zones
o Containers and container mappings
o Network mapping as well
o At the end – the actual ASR configuration itself for the VM / VMs in question
Azure Site Recovery Deployment Specifics
There are a few things amongst others that you have to be aware when you deploy ASR configurations that could save you a bit of trouble if you are just starting into it:
– Cache storage account needs to in the SAME subscription as the virtual machines when you deploy it, no matter what the documentation says:
https://learn.microsoft.com/en-us/azure/site-recovery/azure-to-azure-support-matrix#cache-storage
Now, if you really need to move it outside of that subscription, what we have seen working is, moving it AFTER the deployment of ASR and the configuration of all replications. However, after that it cannot be used for new configurations from the other subscription. So if you can, live with a cache storage in the same subscription
– Terraform azurerm_site_recovery_replicated_vm module has just recently started support Zone redundant target and replica disks and the great irony was that Terraform will not allow you to select them, however, Azure API itself will not allow you to replicate ZRS to anything else then ZRS. This is hopefully fixed now for the long term
– Configure Site Recovery vault in the source region – this is the only supported configuration
– If you are using private endpoints and cannot integrate Azure DNS servers, use host file entries for:
o The cache storage DNS entry
o The Recovery Vault Private Endpoint DNS entry (this is really important!!)
If you manage to go around these pitfalls, you should generally be fine.
However, if you are still struggling or you just need a partner on your DR deployments in Azure let us know in contacts and we can:
– Create the end to end design
– Help you deploy your DevOps environment
– Help you set up Terraform and your pipelines
– Help you deploy ASR