Test DR by using failover or restore

In this tutorial we will learn how to Test DR by using failover or restore.

However, Azure SQL Database offers the following capabilities for recovering from an outage:

Active geo-replication
Auto-failover groups
Geo-restore
Zone-redundant databases

Prepare for the event of an outage

For success with recovery to another data region using either failover groups or geo-redundant backups. Further, you need to prepare a server in another data center outage to become the new primary server should the need arise. This also have well-defined steps documented and tested to ensure a smooth recovery. These preparation steps include:

Firstly, identify the server in another region to become the new primary server. For geo-restore, this is generally a server in the paired region for the region in which your database is located. This eliminates the additional traffic cost during the geo-restoring operations.
Secondly, identify, and optionally define, the server-level IP firewall rules needed on for users to access the new primary database.
Thirdly, determine how you are going to redirect users to the new primary server, such as by changing connection strings or by changing DNS entries.
Next, identify, and optionally create, the logins that must be present in the master database on the new primary server, and ensure these logins have appropriate permissions in the master database, if any.
After that, identify alert rules that need to be updated to map to the new primary database.
Lastly, document the auditing configuration on the current primary database

When to initiate recovery

The recovery operation impacts the application. It requires changing the SQL connection string or redirection using DNS and could result in permanent data loss. Therefore, it should be done only when the outage is likely to last longer than your application’s recovery time objective. However, when the application is deployed to production you should perform regular monitoring of the application health and use the following data points to assert that the recovery is warranted:

Firstly, permanent connectivity failure from the application tier to the database.
Secondly, the Azure portal shows an alert about an incident in the region with broad impact.

Wait for service recovery

The Azure teams work diligently to restore service availability as quickly as possible but depending on the root cause it can take hours or days. If your application can tolerate significant downtime you can simply wait for the recovery to complete. In this case, no action on your part is required. After the recovery of the region, your application’s availability is restored.

Fail over to geo-replicated secondary server in the failover group

If your application’s downtime can result in business liability, you should be using failover groups. Moreover, it enables the application to quickly restore availability in a different region in case of an outage. For a tutorial, see Implement a geo-distributed database.

Further, to restore availability of the database(s) you need to initiate the failover to the secondary server using one of the supported methods. use one of the following guides to fail over to a geo-replicated secondary database:

Firstly, fail over to a geo-replicated secondary server using the Azure portal
Secondly, fail over to the secondary server using PowerShell
Lastly, fail over to a secondary server using Transact-SQL (T-SQL)

Configure your database after recovery

If you are using geo-restore to recover from an outage, you must make sure that the connectivity to the new databases is properly configured. This is so that the normal application function can be resumed. This is a checklist of tasks to get your recovered database production ready.

Update connection strings

Because your recovered database resides in a different server, you need to update your application’s connection string to point to that server.

Configure firewall rules

You need to make sure that the firewall rules configured on server and on the database match those that were configured on the primary server and primary database.

Configure logins and database users

You need to make sure that all the logins used by your application exist on the server which is hosting your recovered database.

Setup telemetry alerts

You need to make sure your existing alert rule settings get update for map to the recovered database and the different server.

Enable auditing

If auditing is necessary to access your database, you need to enable Auditing after the database recovery.

Test DR by using failover or restore DP-300 online course

Reference: Microsoft Documentation

Go back to DP-300 Tutorials