So, there are three key questions we should be asking:

What are you doing for you SharePoint disaster recovery? 
Do you have separate farms, using availability groups, stretched farm or nothing?
What is SharePoint disaster recovery?

These three questions when asked, will return multiple answers that will probably be different every time you ask them. In reality there are multiple answers and multiple ways of creating a disaster recovery solution for an On-Premises SharePoint solution. Some of the most recent ones I have worked have involved a stretched SharePoint Farm, using SQL availability groups cross a multiple data centers, to simply creating a cold/warm standby farm in a different data center to failover to when needed. There are merits to each of the options, but one option that is fast becoming an option especially for those on SharePoint 2013 or SharePoint 2016 is to offload your disaster recovery solution to the Microsoft Azure cloud. This allows you to free up specific server resources that may house On-Premises and get the benefits of cloud services.

What are the options for Microsoft Azure SharePoint Disaster Recovery?

In reality the same options that you can create On-Premises albeit with some limitations can be created using Microsoft Azure.

  1. Azure Infrastructure Services
    • Hot Recovery
    • Warm Recovery
    • Cold Recovery
  2. Azure Site Recovery

These two core options allow for great flexibility in choosing a solution. 

Azure Infrastructure Services

The Microsoft Azure cloud has had the ability to create full server environments for quite some time now. The ability to create Virtual Machines and create full server farms, only paying for the resources that you need is not only great for Development but also a great option for disaster recovery. Using this service allows us to create three different recovery types depending on what is needed.

Whichever of these options is required and built the following items are needed:

  1. On-Premises production SharePoint Farm
  2. Recovery production SharePoint Farm in Microsoft Azure
  3. Site-to-Site VPN connection between both SharePoint Farms

A sample design proved by Microsoft could be as simple as this.

The key here is the bandwidth and VPN connection between the two locations. Azure ExpressRoute lets you create private connections between Azure datacenters and infrastructure on your premises or in a colocation environment. ExpressRoute connections don't go over the public Internet. They offer more reliability, faster speeds, and lower latencies, and higher security than typical Internet connections. In some cases, using ExpressRoute connections to transfer data between on-premises systems and Azure can yield significant cost benefits. ExpressRoute gives you a fast and reliable connection to Azure, making it suitable for scenarios like periodic data migration, replication for business continuity, disaster recovery, and other high-availability strategies. It can be a cost-effective option for transferring large amounts of data, such as datasets for high-performance computing applications, or moving large virtual machines between your Dev-Test environment in an Azure virtual private cloud and your on-premises production environment.

Using ExpressRoute makes Microsoft Azure not only a valid approach but almost beckons the question why are we not using it now!!

In order to use this approach Distributed File System Replication (DSFR) needs to be implemented both within the On-Premises and Recovery infrastructure. DSFR is how database files etc. are moved from one location to another, this limits the need for technologies such as SQL replication to be implemented. This will produce better performance and give you better control over the failover process.

With this implemented on both sides DSFR will transfer the log files from the Production environment up into the recovery environment over the ExpressRoute link. These logs are then stored within the DSFR file system and replayed by SQL server into the recovery environment bringing it back up to date with the production environment. The downside here is that any of the content database that reside in the recovery farm are not attached to SharePoint directly until a recovery process has been performed, with logs replayed etc. The recovery process involves, stopping log shipping, stop accepting incoming traffic to the production farm, replay the final transaction log that was copied using DSFR, attach the content database to the SharePoint farm, restore any service applications to the recovery servers, update the DNS records, so traffic now goes to the recovery servers within in Azure. These are the basic steps, but as you can imagine this could be a lengthy process and it not as automatic as you may need or expect.

Now of course though the process is the same, using a Hot, versus a Warm or Cold recovery environment changes the approach slightly, as the servers may or may not be online which adds extra steps to the recovery process. Looking at each option we can see pro’s and cons to each.

Cold standby disaster recovery strategy

A business ships backups to support bare metal recovery to local and regional offsite storage regularly, and has contracts in place for emergency server rentals in another region. 

Pros: Often the cheapest option to maintain, operationally. Often an expensive option to recover, because it requires that physical servers be configured correctly after a disaster has occurred.

Cons: The slowest option to recover.

Warm standby disaster recovery strategy

A business ships backups or virtual machine images to local and regional disaster recovery farms. 

Pros: Often fairly inexpensive to recover, because a virtual server farm can require little configuration upon recovery. 

Cons: Can be very expensive and time-consuming to maintain.

Hot standby disaster recovery strategy

A business runs multiple data centers, but serves content and services through only one data center. 

Pros: Often fairly fast to recover. 

Cons: Can be very expensive to configure and maintain.

To read more see here: https://technet.microsoft.com/en-us/library/ff628971.aspx

If you determine through your due diligence that you want a more automated approach, then Azure Site Recovery might be the option. Site Recovery is a fantastic option but comes at a cost as far as understanding and setup go. 

Azure Site Recovery enables you to deploy application-aware availability on demand solutions. Be it Windows Server or Linux based applications, Microsoft first party enterprise applications or offerings from other vendors, you can use Azure Site Recovery to enable disaster recovery, deploy on-demand Dev/Test environments or migrate them to Azure. ASR replication technologies can protect the entire virtual machine with all their disks and data. This allows it to be compatible with any application running on the machine. Applications such as SharePoint, Exchange, Dynamics, SQL Server can all take advantage of ASR. 

So how does Azure Site Recovery(ASR) work?

Firstly, your Production On-Premises environment can either me Physical or Virtual Machines as ASR is able to protect them. With this environment configured you can then setup ASR and have it create snapshots of the Production Servers. These are created as VHD files, so Virtual Machines. So now you may see how this is going to work

The great news here is that ASR supports more than just the Microsoft stack, such as Hyper-V or Physical Server, but also support VMWare as the Production environment that you wish to protect.

These VHD files are pushed out in a failure on demand, and spun back up as if the Production environment had always lived within Microsoft Azure. Of course, there are a few things that need configuring afterwards to ensure it is all working, but the goal here is that it is fully automated. The big note here is that Active Directory and DNS also need to be part of this, as the name implies it is a “site” recovery so everything that is needed for the SharePoint solution to work needs to also be protected by ASR.

The configuration between the two sites relies on either replicating the current Active Directory environment to a secondary site, or standing up an additional Domain Controller in the failover site.

The ASR configuration is done using the Azure Portal, and requires you to create the target for the where the environment needs to be moved to, which could be another On-Premises data center or Azure which makes it a very scalable solution. While creating the plan, you can specify the servers (Virtual Machines) along with see the steps needed to prepare and what it will follow as part of the failover.

Once it has all been configured, you can then initiate a planned or unplanned failure to move the sites around to the secondary location. You can also change the direction of the environments at any point too.

To read more download the documentation provided by Microsoft: https://gallery.technet.microsoft.com/SharePoint-DR-Solution-f6b4aeae 

To see a demo on this approach head over to channel 9 and watch the Ignite session provided by Microsoft. Best Practices for Deploying Disaster Recovery Services with Microsoft Azure Site Recovery: 

https://channel9.msdn.com/Events/Ignite/2015/BRK3503

So as an organization you have multiple options using the standard approaches albeit extending them to Azure, then you have the more complicated but better solution I think, of a full site recovery which will give you greater protection.

Resources