This article teaches you how to monitor and troubleshoot issues with a link between SQL Server and Azure SQL Managed Instance.
You can check the state of the link with Transact-SQL (T-SQL), Azure PowerShell or the Azure CLI. If you encounter issues, you can use the error codes to troubleshoot the problem.
If you run into issues with a link, you can use Transact-SQL (T-SQL), Azure PowerShell or the Azure CLI to get information about the current state of the link.
Use T-SQL for a quick status details of the link state, and then use Azure PowerShell or the Azure CLI for a comprehensive information about the current state of the link.
Use T-SQL to determine the state of the link during the seeding phase, or after data synchronization begins.
Use the following T-SQL query to determine the status of the link during the seeding phase on the SQL Server or SQL Managed Instance that hosts the database seeded through the link:
SELECT
ag.local_database_name AS 'Local database name',
ar.current_state AS 'Current state',
ar.is_source AS 'Is source',
ag.internal_state_desc AS 'Internal state desc',
ag.database_size_bytes / 1024 / 1024 AS 'Database size MB',
ag.transferred_size_bytes / 1024 / 1024 AS 'Transferred MB',
ag.transfer_rate_bytes_per_second / 1024 / 1024 AS 'Transfer rate MB/s',
ag.total_disk_io_wait_time_ms / 1000 AS 'Total Disk IO wait (sec)',
ag.total_network_wait_time_ms / 1000 AS 'Total Network wait (sec)',
ag.is_compression_enabled AS 'Compression',
ag.start_time_utc AS 'Start time UTC',
ag.estimate_time_complete_utc as 'Estimated time complete UTC',
ar.completion_time AS 'Completion time',
ar.number_of_attempts AS 'Attempt No'
FROM sys.dm_hadr_physical_seeding_stats AS ag
INNER JOIN sys.dm_hadr_automatic_seeding AS ar
ON local_physical_seeding_id = operation_id
-- Estimated seeding completion time
SELECT DISTINCT CONVERT(VARCHAR(8), DATEADD(SECOND, DATEDIFF(SECOND, start_time_utc, estimate_time_complete_utc) ,0), 108) as 'Estimated complete time'
FROM sys.dm_hadr_physical_seeding_stats
If the query returns no results, then the seeding process hasn't started or has already completed.
Use the following T-SQL query on the primary instance to check the health of the link once data synchronization begins:
DECLARE @link_name varchar(max) = '<DAGname>'
SELECT
rs.synchronization_health_desc [Link sync health]
FROM
sys.availability_groups ag
join sys.dm_hadr_availability_replica_states rs
on ag.group_id = rs.group_id
WHERE
rs.is_local = 0 AND rs.role = 2 AND ag.is_distributed = 1 AND ag.name = @link_name
GO
The query returns the following possible values:
no result: The query was executed on the secondary instance.
HEALTHY: The link is healthy, and data is being synchronized between the replicas.
NOT_HEALTHY: The link is unhealthy, and data is not synchronizing between the replicas.
Run the following sample code in Azure Cloud Shell or install the Az.SQL module locally.
$ManagedInstanceName = "<ManagedInstanceName>" # The name of your linked SQL Managed Instance
$DAGName = "<DAGName>" # distributed availability group name
# Find out the resource group name
$ResourceGroupName = (Get-AzSqlInstance -InstanceName $ManagedInstanceName).ResourceGroupName
# Show link state details
(Get-AzSqlInstanceLink -ResourceGroupName $ResourceGroupName -InstanceName $ManagedInstanceName -Name $DAGName).Databases
# type "az" to use Azure CLI
managedInstanceName = "<ManagedInstanceName>" # The name of your linked SQL Managed Instance
dagName = "<DAGName>" # distributed availability group name
rgName = "<RGName>" # the resource group for the linked SQL Managed Instance
# Print link state details
az sql mi link show --resource-group $rgName --instance-name $managedInstanceName --name $dagName
The replicaState value describes the current link. If the state also includes Error then an error occurred during the operation listed in the state. For example, LinkCreationError indicates that an error occurred while creating the link.
Some possible replicaState values are:
CreatingLink: Initial seeding
LinkSynchronizing: Data replication is in progress
There are two distinct categories of errors you can encounter when using the link - errors when you try to initialize the link, and errors when you try to create the link.
Errors initializing a link
The following error can occur when initializing a link (Link state: LinkInitError):
Error 41962: Operation aborted because the link wasn't initiated within 5 minutes. Check network connectivity and try again.
Error 41976: The availability group isn't responding. Check names and configuration parameters and try again.
Error 41986: Link can't be established because the connection failed or the secondary replica isn't responsive. Check names, configuration parameters, and network connectivity and then try again.
Error 47521: Link can't be established because the secondary server didn't receive the request. Make sure the availability group and databases are healthy on the primary server and try again.
Errors creating a link
The following error can occur when creating a link (Link state: LinkCreationError):
Error 41977: The target database isn't responsive. Check link parameters and try again.
Inconsistent state after forced failover
Following a forced failover, you might encounter a split-brain scenario where both replicas are in the primary role, leaving the link in an inconsistent state. This can happen if you fail over to the secondary replica during a disaster, and then the primary replica comes back online.
First, confirm you're in a split-brain scenario. You can do so by using SQL Server Management Studio (SSMS) or Transact-SQL (T-SQL).
Connect to both SQL Server and SQL managed instance in SSMS, and then in Object Explorer, expand Availability replicas under the Availability group node in Always On High Availability. If two different replicas are listed as (Primary), you're in a split-brain scenario.
Alternatively, you can run the following T-SQL script on both SQL Server and SQL Managed Instance to check the role of the replicas:
-- Execute on SQL Server and SQL Managed Instance
USE master
DECLARE @link_name varchar(max) = '<DAGName>'
SELECT
ag.name [Link name],
rs.role_desc [Link role]
FROM
sys.availability_groups ag
JOIN sys.dm_hadr_availability_replica_states rs
ON ag.group_id = rs.group_id
WHERE
rs.is_local = 1 AND ag.is_distributed = 1 AND ag.name = @link_name
GO
If both instances list PRIMARY in Link role column, you're in a split-brain scenario.
To resolve the split brain state, first take a backup on whichever replica was the original primary. If the original primary was SQL Server, then take a tail log backup. If the original primary was SQL Managed Instance, then take a copy-only full backup. After the backup completes, set the distributed availability group to the secondary role for the replica that used to be the original primary but will now be the new secondary.
For example, in the event of a true disaster, assuming you've forced a failover of your SQL Server workload to Azure SQL Managed Instance, and you intend to continue running your workload on SQL Managed Instance, take a tail log backup on SQL Server, and then set the distributed availability group to the secondary role on SQL Server such as the following example:
--Execute on SQL Server
USE master
ALTER AVAILABILITY GROUP [<DAGName>]
SET (ROLE = SECONDARY)
GO
Next, execute a planned manual failover from SQL Managed Instance to SQL Server by using the link, such as the following example:
--Execute on SQL Managed Instance
USE master
ALTER AVAILABILITY GROUP [<DAGName>] FAILOVER
GO
Test network connectivity
Bidirectional network connectivity between SQL Server and SQL Managed Instance is necessary for the link to work. After you open ports on the SQL Server side and configure an NSG rule on the SQL Managed Instance side, test connectivity by using either SQL Server Management Studio (SSMS) or Transact-SQL.
Test the network by creating a temporary SQL Agent job on both SQL Server and SQL Managed Instance to check the connection between the two instances. When you use Network Checker in SSMS, the job is automatically created for you, and deleted after the test completes. You need to manually delete the SQL Agent job if you test your network by using T-SQL.
Note
Executing PowerShell scripts by the SQL Server Agent on SQL Server on Linux is not currently supported, so it's not currently possible to execute Test-NetConnection from the SQL Server Agent job on SQL Server on Linux.
To use the SQL Agent to test network connectivity, you need the following requirements:
The user doing the test must have permissions to create a job (either as a sysadmin or belongs to the SQLAgentOperator role for msdb) for both SQL Server and SQL Managed Instance.
The SQL Server Agent service must be running on SQL Server. Since the Agent is on by default on SQL Managed Instance, no additional action is necessary.
To test network connectivity between SQL Server and SQL Managed Instance in SSMS, follow these steps:
Connect to the instance that will be the primary replica in SSMS.
In Object Explorer, expand databases, and right-click the database you intend to link with the secondary. Select Tasks > Azure SQL Managed Instance link > Test Connection to open the Network Checker wizard:
Select Next on the Introduction page of the Network Checker wizard.
If all requirements are met on the Prerequisites page, select Next. Otherwise resolve any unmet prerequisites, and then select Re-run Validation.
On the Login page, select Login to connect to the other instance that will be the secondary replica. Select Next.
Check details on the Specify Network Options page and provide an IP address, if necessary. Select Next.
On the Summary page, review the actions the wizard takes and then select Finish to test the connection between the two replicas.
Review the Results page to validate connectivity exists between the two replicas, and then select Close to finish.
To use T-SQL to test connectivity, you have to check the connection in both directions. First, test the connection from SQL Server to SQL Managed Instance, and then test the connection from SQL Managed Instance to SQL Server.
Test connection from SQL Server to SQL Managed Instance
Use SQL Server Agent on SQL Server to run connectivity tests from SQL Server to SQL Managed Instance.
Connect to SQL Managed Instance, and run the following script to generate parameters you'll need later:
SELECT 'DECLARE @serverName NVARCHAR(512) = N''' + value + ''''
FROM sys.dm_hadr_fabric_config_parameters
WHERE parameter_name = 'DnsRecordName'
UNION
SELECT 'DECLARE @node NVARCHAR(512) = N''' + NodeName + '.' + Cluster + ''''
FROM (
SELECT SUBSTRING(replica_address, 0, CHARINDEX('\', replica_address)) AS NodeName,
RIGHT(service_name, CHARINDEX('/', REVERSE(service_name)) - 1) AppName,
JoinCol = 1
FROM sys.dm_hadr_fabric_partitions fp
INNER JOIN sys.dm_hadr_fabric_replicas fr
ON fp.partition_id = fr.partition_id
INNER JOIN sys.dm_hadr_fabric_nodes fn
ON fr.node_name = fn.node_name
WHERE service_name LIKE '%ManagedServer%'
AND replica_role = 2
) t1
LEFT JOIN (
SELECT value AS Cluster,
JoinCol = 1
FROM sys.dm_hadr_fabric_config_parameters
WHERE parameter_name = 'ClusterName'
) t2
ON (t1.JoinCol = t2.JoinCol)
INNER JOIN (
SELECT [value] AS AppName
FROM sys.dm_hadr_fabric_config_parameters
WHERE section_name = 'SQL'
AND parameter_name = 'InstanceName'
) t3
ON (t1.AppName = t3.AppName)
UNION
SELECT 'DECLARE @port NVARCHAR(512) = N''' + value + ''''
FROM sys.dm_hadr_fabric_config_parameters
WHERE parameter_name = 'HadrPort';
Save the results to use the next steps. Since these parameters can change after any failover, be sure to generate them again, if necessary.
Connect to SQL Server.
Open a new query window and paste the following script:
--START
-- Parameters section
DECLARE @node NVARCHAR(512) = N''
DECLARE @port NVARCHAR(512) = N''
DECLARE @serverName NVARCHAR(512) = N''
--Script section
IF EXISTS (
SELECT job_id
FROM msdb.dbo.sysjobs_view
WHERE name = N'TestMILinkConnection'
)
EXEC msdb.dbo.sp_delete_job @job_name = N'TestMILinkConnection',
@delete_unused_schedule = 1
DECLARE @jobId BINARY (16),
@cmd NVARCHAR(MAX)
EXEC msdb.dbo.sp_add_job @job_name = N'TestMILinkConnection',
@enabled = 1,
@job_id = @jobId OUTPUT
SET @cmd = (N'tnc ' + @serverName + N' -port 5022 | select ComputerName, RemoteAddress, TcpTestSucceeded | Format-List')
EXEC msdb.dbo.sp_add_jobstep @job_id = @jobId,
@step_name = N'Test Port 5022',
@step_id = 1,
@cmdexec_success_code = 0,
@on_success_action = 3,
@on_fail_action = 3,
@subsystem = N'PowerShell',
@command = @cmd,
@database_name = N'master'
SET @cmd = (N'tnc ' + @node + N' -port ' + @port + ' | select ComputerName, RemoteAddress, TcpTestSucceeded | Format-List')
EXEC msdb.dbo.sp_add_jobstep @job_id = @jobId,
@step_name = N'Test HADR Port',
@step_id = 2,
@cmdexec_success_code = 0,
@subsystem = N'PowerShell',
@command = @cmd,
@database_name = N'master'
EXEC msdb.dbo.sp_add_jobserver @job_id = @jobId,
@server_name = N'(local)'
GO
EXEC msdb.dbo.sp_start_job @job_name = N'TestMILinkConnection'
GO
--Check status every 5 seconds
DECLARE @RunStatus INT
SET @RunStatus = 10
WHILE (@RunStatus >= 4)
BEGIN
SELECT DISTINCT @RunStatus = run_status
FROM [msdb].[dbo].[sysjobhistory] JH
INNER JOIN [msdb].[dbo].[sysjobs] J
ON JH.job_id = J.job_id
WHERE J.name = N'TestMILinkConnection'
AND step_id = 0
WAITFOR DELAY '00:00:05';
END
--Get logs once job completes
SELECT [step_name],
SUBSTRING([message], CHARINDEX('TcpTestSucceeded', [message]), CHARINDEX('Process Exit', [message]) - CHARINDEX('TcpTestSucceeded', [message])) AS TcpTestResult,
SUBSTRING([message], CHARINDEX('RemoteAddress', [message]), CHARINDEX('TcpTestSucceeded', [message]) - CHARINDEX('RemoteAddress', [message])) AS RemoteAddressResult,
[run_status],
[run_duration],
[message]
FROM [msdb].[dbo].[sysjobhistory] JH
INNER JOIN [msdb].[dbo].[sysjobs] J
ON JH.job_id = J.job_id
WHERE J.name = N'TestMILinkConnection'
AND step_id <> 0
--END
Replace the @node, @port, and @serverName parameters with the values you got from the first step.
Run the script and check the results. You should see results such as the following example:
Verify the results:
The outcome of each test at TcpTestSucceeded should be TcpTestSucceeded : True.
The RemoteAddresses should belong to the IP range for the SQL Managed Instance subnet.
If the response is unsuccessful, verify the following network settings:
There are rules in both the network firewall and the SQL Server host OS (Windows/Linux) firewall that allows traffic to the entire subnet IP range of SQL Managed Instance.
There's an NSG rule that allows communication on port 5022 for the virtual network that hosts SQL Managed Instance.
Test connection from SQL Managed Instance to SQL Server
To check that SQL Managed Instance can reach SQL Server, first create a test endpoint. Then you use the SQL Server Agent to run a PowerShell script with the tnc command pinging SQL Server on port 5022 from the SQL managed instance.
To create a test endpoint, connect to SQL Server and run the following T-SQL script:
-- Run on SQL Server
-- Create the certificate needed for the test endpoint
USE MASTER
CREATE CERTIFICATE TEST_CERT
WITH SUBJECT = N'Certificate for SQL Server',
EXPIRY_DATE = N'3/30/2051'
GO
-- Create the test endpoint on SQL Server
USE MASTER
CREATE ENDPOINT TEST_ENDPOINT
STATE=STARTED
AS TCP (LISTENER_PORT=5022, LISTENER_IP = ALL)
FOR DATABASE_MIRRORING (
ROLE=ALL,
AUTHENTICATION = CERTIFICATE TEST_CERT,
ENCRYPTION = REQUIRED ALGORITHM AES
)
To verify that the SQL Server endpoint is receiving connections on port 5022, run the following PowerShell command on the host operating system of your SQL Server instance:
tnc localhost -port 5022
A successful test shows TcpTestSucceeded : True. You can then proceed to create a SQL Server Agent job on the SQL managed instance to try testing the SQL Server test endpoint on port 5022 from the SQL managed instance.
Next, create a SQL Server Agent job on the SQL managed instance called NetHelper by running the following T-SQL script on the SQL managed instance. Replace:
<SQL_SERVER_IP_ADDRESS> with the IP address of SQL Server that can be accessed from SQL managed instance.
-- Run on SQL managed instance
-- SQL_SERVER_IP_ADDRESS should be an IP address that could be accessed from the SQL Managed Instance host machine.
DECLARE @SQLServerIpAddress NVARCHAR(MAX) = '<SQL_SERVER_IP_ADDRESS>'; -- insert your SQL Server IP address in here
DECLARE @tncCommand NVARCHAR(MAX) = 'tnc ' + @SQLServerIpAddress + ' -port 5022 -InformationLevel Quiet';
DECLARE @jobId BINARY(16);
IF EXISTS (
SELECT *
FROM msdb.dbo.sysjobs
WHERE name = 'NetHelper'
) THROW 70000,
'Agent job NetHelper already exists. Please rename the job, or drop the existing job before creating it again.',
1
-- To delete NetHelper job run: EXEC msdb.dbo.sp_delete_job @job_name=N'NetHelper'
EXEC msdb.dbo.sp_add_job @job_name = N'NetHelper',
@enabled = 1,
@description = N'Test SQL Managed Instance to SQL Server network connectivity on port 5022.',
@category_name = N'[Uncategorized (Local)]',
@owner_login_name = N'sa',
@job_id = @jobId OUTPUT;
EXEC msdb.dbo.sp_add_jobstep @job_id = @jobId,
@step_name = N'TNC network probe from SQL MI to SQL Server',
@step_id = 1,
@os_run_priority = 0,
@subsystem = N'PowerShell',
@command = @tncCommand,
@database_name = N'master',
@flags = 40;
EXEC msdb.dbo.sp_update_job @job_id = @jobId,
@start_step_id = 1;
EXEC msdb.dbo.sp_add_jobserver @job_id = @jobId,
@server_name = N'(local)';
Tip
If you need to modify the IP address of your SQL Server for the connectivity probe from SQL managed instance, delete NetHelper job by running EXEC msdb.dbo.sp_delete_job @job_name=N'NetHelper', and re-create NetHelper job using the previous script.
Then, create a stored procedure ExecuteNetHelper that helps run the job, and obtains results from the network probe. Run the following T-SQL script on SQL managed instance:
-- Run on managed instance
IF EXISTS(SELECT * FROM sys.objects WHERE name = 'ExecuteNetHelper')
THROW 70001, 'Stored procedure ExecuteNetHelper already exists. Rename or drop the existing procedure before creating it again.', 1
GO
CREATE PROCEDURE ExecuteNetHelper AS
-- To delete the procedure run: DROP PROCEDURE ExecuteNetHelper
BEGIN
-- Start the job.
DECLARE @NetHelperstartTimeUtc DATETIME = GETUTCDATE();
DECLARE @stop_exec_date DATETIME = NULL;
EXEC msdb.dbo.sp_start_job @job_name = N'NetHelper';
-- Wait for job to complete and then see the outcome.
WHILE (@stop_exec_date IS NULL)
BEGIN
-- Wait and see if the job has completed.
WAITFOR DELAY '00:00:01'
SELECT @stop_exec_date = sja.stop_execution_date
FROM msdb.dbo.sysjobs sj
INNER JOIN msdb.dbo.sysjobactivity sja
ON sj.job_id = sja.job_id
WHERE sj.name = 'NetHelper'
-- If job has completed, get the outcome of the network test.
IF (@stop_exec_date IS NOT NULL)
BEGIN
SELECT sj.name JobName,
sjsl.date_modified AS 'Date executed',
sjs.step_name AS 'Step executed',
sjsl.log AS 'Connectivity status'
FROM msdb.dbo.sysjobs sj
LEFT JOIN msdb.dbo.sysjobsteps sjs
ON sj.job_id = sjs.job_id
LEFT JOIN msdb.dbo.sysjobstepslogs sjsl
ON sjs.step_uid = sjsl.step_uid
WHERE sj.name = 'NetHelper'
END
-- In case of operation timeout (90 seconds), print timeout message.
IF (datediff(second, @NetHelperstartTimeUtc, getutcdate()) > 90)
BEGIN
SELECT 'NetHelper timed out during the network check. Please investigate SQL Agent logs for more information.'
BREAK;
END
END
END;
Run the following query on SQL managed instance to execute the stored procedure that will execute the NetHelper agent job and show the resulting log:
-- Run on managed instance
EXEC ExecuteNetHelper;
If the connection was successful, the log shows True. If the connection was unsuccessful, the log shows False.
If the connection was unsuccessful, verify the following items:
The firewall on the host SQL Server instance allows inbound and outbound communication on port 5022.
An NSG rule for the virtual network that hosts SQL Managed Instance allows communication on port 5022.
If your SQL Server instance is on an Azure VM, an NSG rule allows communication on port 5022 on the virtual network that hosts the VM.
SQL Server is running.
There exists test endpoint on SQL Server.
After resolving issues, rerun NetHelper network probe again by running EXEC ExecuteNetHelper on managed instance.
Finally, after the network test is successful, drop the test endpoint and certificate on SQL Server by using the following T-SQL commands:
-- Run on SQL Server
DROP ENDPOINT TEST_ENDPOINT;
GO
DROP CERTIFICATE TEST_CERT;
GO
Caution
Proceed with the next steps only if you've validated network connectivity between your source and target environments. Otherwise, troubleshoot network connectivity issues before proceeding.
Related content
For more information on the link feature, review the following resources: