Restart an instance with a user-initiated manual failover - Azure SQL Managed Instance

Applies to: Azure SQL Managed Instance

This article explains how to restart Azure SQL Managed Instance by performing a manual user-initiated failover to a secondary compute node by using PowerShell, the Azure CLI or REST API.

It's possible to fail over a primary node in the General Purpose (GP) and Business Critical (BC) service tiers, and to manually fail over a secondary read-only replica node in the BC service tier.

Note

Failover in this article refers starting the SQL Server database engine process on a secondary node, and is not related to the cross-region failover of a failover groups.

When to use manual failover

Availability, a fundamental part of the SQL Managed Instance platform, works transparently for your database applications by providing local standby compute nodes to host the SQL Server database engine process. A failover occurs when the SQL Server database engine process is taken offline on the primary compute node and is brought online on a secondary compute node. Typically, failovers between SQL managed instance compute nodes are automatic and managed by the platform when a fault is detected, a node has degraded, or during regular monthly software updates.

The entire failover operation consists of bringing the SQL Server database engine process online on a secondary node, validating the database state, and then finally redirecting the client connections to the new primary node. Client connections only fail for a short period of time, typically under a minute, during the final step of the failover operation.

You might execute a manual failover to restart the engine process on a different node for the following reasons:

  • Failed logins, or slowness due to performance issues.
  • Testing application for failover resiliency before deploying to production.
  • Testing end-to-end systems for fault resiliency on automatic failovers.
  • Testing how failover impacts existing database sessions.
  • Query performance degradation (restarting the instance can help mitigate the performance issue).

Ensuring that your applications are failover resilient prior to deploying to production helps mitigate the risk of application faults in production and contributes to application availability for your customers.

Permissions

Users initiating a failover need to have one of the following Azure roles:

  • Subscription Owner role, or
  • SQL Managed Instance Contributor role, or
  • Custom role with the following permission:
    • Microsoft.Sql/managedInstances/failover/action

Restart an instance with a manual fail over

You can restart an instance with a manual failover by using PowerShell, the Azure CLI, or REST API.

The minimum version of Az.Sql needs to be v2.9.0.

As a pre-requirement, use the following PowerShell script to install required Azure modules. In addition, select the subscription where SQL Managed Instance you wish to failover is located.

$subscription = 'enter your subscription ID here'
Install-Module -Name Az
Import-Module Az.Accounts
Import-Module Az.Sql

Connect-AzAccount -Environment AzureChinaCloud
Select-AzSubscription -SubscriptionId $subscription

Use PowerShell command Invoke-AzSqlInstanceFailover with the following example to initiate failover of the primary node, applicable to both BC and GP service tier.

$ResourceGroup = 'enter resource group of your MI'
$ManagedInstanceName = 'enter MI name'
Invoke-AzSqlInstanceFailover -ResourceGroupName $ResourceGroup -Name $ManagedInstanceName

Use the following PowerShell command to failover read secondary node, applicable to BC service tier only.

$ResourceGroup = 'enter resource group of your MI'
$ManagedInstanceName = 'enter MI name'
Invoke-AzSqlInstanceFailover -ResourceGroupName $ResourceGroup -Name $ManagedInstanceName -ReadableSecondary

Monitor the failover

Monitoring the progress of a user-initiated failover differs for instances in the Business Critical and General Purpose service tiers.

To monitor progress of a user-initiated failover, use:

  • sys.dm_os_sys_info to check system uptime for the General Purpose service tier.
  • sys.dm_hadr_fabric_replica_states to check available replicas for the Business Critical service tier.

The final step of the failover process is the redirection of client connections to the new primary node. The short loss of connectivity from your client during the failover, typically lasting under a minute, indicates failover has succeeded - regardless of service tier.

General Purpose service tier

The following T-SQL example shows the uptime for the SQL process on the node for the General Purpose service tier:

SELECT sqlserver_start_time, sqlserver_start_time_ms_ticks FROM sys.dm_os_sys_info

The SQL process start time is the time when the SQL Server database engine process was started on the node. The time restarts after failover completes. Run this query before and after you initiate a failover of an instance in the General Purpose service tier to monitor progress of the failover operation.

Business Critical service tier

The following T-SQL example indicates which node is currently the primary replica for the Business Critical service tier:

SELECT DISTINCT replication_endpoint_url, fabric_replica_role_desc FROM sys.dm_hadr_fabric_replica_states

The node that hosts the primary replica changes after failover completes. Run this query before and after you initiate a failover of an instance in the Business Critical service tier to monitor progress of the failover operation.

Note

The full failover process might take several minutes to complete during high-intensity workloads because the instance engine ensures that transactions on the secondary node are caught up to the transactions from the primary node before initiating the failover.

Limitations

Consider the following functional limitations of user-initiated manual failovers:

  • There can only be one (1) failover initiated on the same SQL Managed Instance every 15 minutes.
  • Failovers aren't allowed:
    • Until the first full backup for a new database is completed by automated backup systems.
    • if there's database restore in progress.
  • For instances in the Business Critical service tier:
    • There must exist quorum of replicas for the failover request to be accepted.
    • It is not possible to specify which readable secondary replica to initiate the failover on.