Troubleshoot VM extension-based Hybrid Runbook Worker issues in Automation
This article provides information on troubleshooting and resolving issues with Azure Automation extension-based Hybrid Runbook Workers. For troubleshooting agent-based workers, see Troubleshoot agent-based Hybrid Runbook Worker issues in Automation. For general information, see Hybrid Runbook Worker overview.
General checklist
To help troubleshoot issues with extension-based Hybrid Runbook Workers:
Check the OS is supported, and the prerequisites have been met. See Prerequisites.
Check whether the system-assigned managed identity is enabled on the VM. Azure VMs and Arc enabled Azure Machines should be enabled with a system-assigned managed identity.
Check whether the extension is enabled with the right settings. Setting file should have right
AutomationAccountURL
. Cross-check the URL with Automation account property -AutomationHybridServiceUrl
.- For Windows, you can find the settings file here:
Tip
Replace
*
in the below path with the specific version that is installed if you know it.C:\Packages\Plugins\Microsoft.Azure.Automation.HybridWorker.HybridWorkerForWindows\*\RuntimeSettings
- For Linux, you can find the settings file here:
/var/lib/waagent/Microsoft.Azure.Automation.HybridWorker.HybridWorkerForLinux/
Check the error message shown in the Hybrid worker extension status/Detailed Status. It contains error message(s) and respective recommendation(s) to fix the issue.
Run the troubleshooter tool on the VM and it generates an output file. Open the output file and verify the errors identified by the troubleshooter tool.
- For Windows, you can find the troubleshooter here:
Tip
Replace
*
in the below path with the specific version that is installed if you know it.C:\Packages\Plugins\Microsoft.Azure.Automation.HybridWorker.HybridWorkerForWindows\*\bin\troubleshooter\TroubleShootWindowsExtension.ps1
- For Linux, you can find the troubleshooter here:
Tip
Replace
*
in the below path with the specific version that is installed if you know it./var/lib/waagent/Microsoft.Azure.Automation.HybridWorker.HybridWorkerForLinux-*/Troubleshooter/LinuxTroubleshooter.py
For Linux machines, the Hybrid worker extension creates a
hweautomation
user and starts the Hybrid worker under the user. Check whether the userhweautomation
is set up with the correct permissions. If your runbook is trying to access any local resources, ensure that thehweautomation
has the correct permissions to the local resources.Check whether the hybrid worker process is running.
- For Windows, check the
Hybrid Worker Service
(HybridWorkerService) service. - For Linux, check the
hwd
service.
- For Windows, check the
Collect logs:
- For Windows, run the log collector tool located here:
Tip
Replace
*
in the below path with the specific version that is installed if you know it.C:\Packages\Plugins\Microsoft.Azure.Automation.HybridWorker.HybridWorkerForWindows\*\bin\troubleshooter\PullLogs.ps1
Logs will be located here:
C:\HybridWorkerExtensionLogs
- For Linux: Logs are in the following folders:
and/var/log/azure/Microsoft.Azure.Automation.HybridWorker.HybridWorkerForLinux
/home/hweautomation
Scenario: Runbooks go into a suspended state on a Hybrid Runbook Worker when using a custom account on a server with User Account Control (UAC) enabled
Issue
Jobs fail and go into a suspended state on the Hybrid Runbook Worker. The Microsoft-SMA event logs indicate
Win32 Process Exited with code [2148734720]
and a corresponding error in Application log when the runbook tries to execute is .NET Runtime version : 4.0.30319.0
indicating that the application couldn't be started.
Cause
When a system has UAC/LUA in place, permissions must be granted directly and not through any group membership and when user has to elevate permissions, the jobs begin to fail.
Resolution
For Custom user on the Hybrid Runbook Worker, update the permissions in the following folders:
Folder | Permissions |
---|---|
C:\ProgramData\AzureConnectedMachineAgent\Tokens |
Read |
C:\Packages\Plugins\Microsoft.Azure.Automation.HybridWorker.HybridWorkerForWindows |
Read and Execute |
Scenario: Job failed to start as the Hybrid Worker wasn't available when the scheduled job started
Issue
Job fails to start on a Hybrid Worker, and you see the following error:
Failed to start, as hybrid worker wasn't available when scheduled job started, the hybrid worker was last active at mm/dd/yyyy.
Cause
This error can occur due to the following reasons:
- The machines don't exist anymore.
- The machine is turned off and is unreachable.
- The machine has a network connectivity issue.
- The Hybrid Runbook Worker extension has been uninstalled from the machine.
Resolution
- Ensure that the machine exists, and Hybrid Runbook Worker extension is installed on it. The Hybrid Worker should be healthy and should give a heartbeat. Troubleshoot any network issues by checking the Microsoft-SMA event logs on the Workers in the Hybrid Runbook Worker Group that tried to run this job.
- You can also monitor HybridWorkerPing metric that provides the number of pings from a Hybrid Worker and can help to check ping-related issues.
Scenario: Job was suspended as it exceeded the job limit for a Hybrid Worker
Issue
Job gets suspended with the following error message:
Job was suspended as it exceeded the job limit for a Hybrid Worker. Add more Hybrid Workers to the Hybrid Worker group to overcome this issue.
Cause
Jobs might get suspended due to any of the following reasons:
- Each active Hybrid Worker in the group will poll for jobs every 30 seconds to see if any jobs are available. The Worker picks jobs on a first-come, first-serve basis. Depending on when a job was pushed, whichever Hybrid Worker within the Hybrid Worker Group pings the Automation service first picks up the job. A single hybrid worker can generally pick up four jobs per ping (that is, every 30 seconds). If your rate of pushing jobs is higher than four per 30 seconds and no other Worker picks up the job, the job might get suspended.
- Hybrid Worker might not be polling as expected every 30 seconds. This could happen if the Worker isn't healthy or there are network issues.
Resolution
- If the job limit for a Hybrid Worker exceeds four jobs per 30 seconds, you can add more Hybrid Workers to the Hybrid Worker group for high availability and load balancing. You can also schedule jobs so they do not exceed the limit of four jobs per 30 seconds. The processing time of the jobs queue depends on the Hybrid worker hardware profile and load. Ensure that the Hybrid Worker is healthy and gives a heartbeat.
- Troubleshoot any network issues by checking the Microsoft-SMA event logs on the Workers in the Hybrid Runbook Worker Group that tried to run this job.
- You can also monitor the HybridWorkerPing metric that provides the number of pings from a Hybrid Worker and can help to check ping-related issues.
Scenario: Hybrid Worker deployment fails with Private Link error
Issue
You are deploying an extension-based Hybrid Runbook Worker on a VM and it fails with error: Authentication failed for private links.
Cause
The virtual network of the VM is different from the private endpoint of Azure Automation account, or they are not connected.
Resolution
Ensure that the private end point of Azure Automation account is connected to the same Virtual Network, to which the VM is connected. Follow the steps mentioned in Planning based on your network to connect to a private endpoint. Also set public network access flags to configure an Automation account to deny all public configuration and allow only connections through private endpoints. For more information on how to configure DNS settings for private endpoints, see DNS configuration
Scenario: Hybrid Worker deployment fails when the provided Hybrid Worker group does not exist
Issue
You are deploying an extension-based Hybrid Runbook Worker on a VM and it fails with error: Account/Group specified does not exist.
Cause
The Hybrid Runbook Worker group to which the Hybrid Worker is to be deployed is already deleted.
Resolution
Ensure that you create the Hybrid Runbook Worker group and add the VM as a Hybrid Worker in that group. Follow the steps mentioned in create a Hybrid Runbook Worker group using the Azure portal.
Scenario: Hybrid Worker deployment fails when system-assigned managed identity is not enabled on the VM
Issue
You are deploying an extension-based Hybrid Runbook Worker on a VM and it fails with error:
Unable to retrieve IMDS identity endpoint for non-Azure VM. Ensure that the Azure connected machine agent is installed and System-assigned identity is enabled.
Cause
You are deploying the extension-based Hybrid Worker on a non-Azure VM that does not have Arc connected machine agent installed on it.
Resolution
Non-Azure machines must have the Arc connected machine agent installed on it, before deploying it as an extension-based Hybrid Runbook worker. To install the AzureConnectedMachineAgent
, see connect hybrid machines to Azure from the Azure portal
for Arc-enabled servers.
Scenario: Hybrid Worker deployment fails due to System assigned identity not enabled
Issue
You are deploying an extension-based Hybrid Runbook Worker on a VM, and it fails with error: Invalid Authorization Token.
Cause
User-assigned managed identity of the VM is enabled, but system-assigned managed identity isn't enabled.
Resolution
Follow the steps listed below:
- Enable System-assigned managed identity of the VM.
- Delete the Hybrid Worker extension installed on the VM.
- Reinstall the Hybrid Worker extension on the VM.
Scenario: Installation process of Hybrid Worker extension on Windows VM gets stuck
Issue
You have installed Hybrid Worker extension on a Windows VM from the Portal, but don't get a notification that the process has completed successfully.
Cause
Sometimes the installation process might get stuck.
Resolution
Follow the steps mentioned below to install Hybrid Worker extension again:
Open PowerShell console.
Remove the registry key, if present:
HKLM:\Software\Microsoft\Azure\HybridWorker
PowerShell code to remove the registry key along with any subkeys and values under it.:
Get-Item HKLM:\Software\Microsoft\Azure\HybridWorker | Remove-Item -Recurse
Remove the registry key, if present:
HKLM:\Software\Microsoft\HybridRunbookWorkerV2
PowerShell code to remove the registry key along with any subkeys and values under it.:
Get-Item HKLM:\Software\Microsoft\HybridRunbookWorkerV2 | Remove-Item -Recurse
Navigate to the Hybrid Worker extension installation folder:
Tip
Replace
*
in the below command with the specific version that is installed if you know it.cd "C:\Packages\Plugins\Microsoft.Azure.Automation.HybridWorker.HybridWorkerForWindows\*"
Install the Hybrid Worker extension:
.\bin\install.ps1
Enable the Hybrid Worker extension:
.\bin\enable.ps1
Scenario: Uninstallation process of Hybrid Worker extension on Windows VM gets stuck
Issue
You have installed a Hybrid Worker extension on a Windows VM from the portal, but don't get a notification that the process has completed successfully.
Cause
Sometimes the uninstallation process might get stuck.
Resolution
Open PowerShell console.
Navigate to the Hybrid Worker extension installation folder:
Tip
Replace
*
in the below command with the specific version that is installed if you know it.cd "C:\Packages\Plugins\Microsoft.Azure.Automation.HybridWorker.HybridWorkerForWindows\*"
Disable the Hybrid Worker extension:
.\bin\disable.cmd
Uninstall the Hybrid Worker extension:
.\bin\uninstall.ps1
Remove registry key, if present:
HKLM:\Software\Microsoft\Azure\HybridWorker
PowerShell code to remove the registry key along with any subkeys and values under it.:
Get-Item HKLM:\Software\Microsoft\Azure\HybridWorker | Remove-Item -Recurse
Remove the registry key, if present:
HKLM:\Software\Microsoft\HybridRunbookWorkerV2
PowerShell code to remove the registry key along with any subkeys and values under it.:
Get-Item HKLM:\Software\Microsoft\HybridRunbookWorkerV2 | Remove-Item -Recurse
Scenario: Installation process of Hybrid Worker extension on Linux VM gets stuck
Issue
You have installed a Hybrid Worker extension on a Linux VM from the portal, but don't get a notification that the process has completed successfully.
Cause
Sometimes the uninstallation process might get stuck.
Resolution
Delete the
state
folder:rm -r /home/hweautomation/state
Navigate to the Hybrid Worker extension installation folder:
Tip
Replace
*
in the below command with the specific version that is installed if you know it.cd /var/lib/waagent/Microsoft.Azure.Automation.HybridWorker.HybridWorkerForLinux-*/
Delete the mrseq file:
rm mrseq
Install the Hybrid Worker Extension:
./extension_shim.sh -c ./HWExtensionHandlers.py -i
Enable the Hybrid Worker extension:
./extension_shim.sh -c ./HWExtensionHandlers.py -e
Scenario: Uninstallation process of Hybrid Worker extension on Linux VM gets stuck
Issue
You have uninstalled Hybrid Worker extension on a Linux VM from the portal, but don't get a notification that the process has completed successfully.
Cause
Sometimes the uninstallation process might get stuck.
Resolution
Follow the steps mentioned below to completely uninstall Hybrid Worker extension:
- Navigate to the Hybrid Worker Extension installation folder:
Tip
Replace
*
in the below command with the specific version that is installed if you know it.cd /var/lib/waagent/Microsoft.Azure.Automation.HybridWorker.HybridWorkerForLinux-*/
- Disable the Hybrid Worker extension:
./extension_shim.sh -c ./HWExtensionHandlers.py -d
- Uninstall the Hybrid Worker extension:
./extension_shim.sh -c ./HWExtensionHandlers.py -u
Scenario: Runbook execution fails
Issue
Runbook execution fails, and you receive the following error message:
The job action 'Activate' cannot be run, because the process stopped unexpectedly. The job action was attempted three times.
Your runbook is suspended shortly after it attempts to execute three times. There are conditions that can interrupt the runbook from completing. The related error message might not include any additional information.
Cause
The following are possible causes:
- The runbooks can't authenticate with local resources.
- The hybrid worker is behind a proxy or firewall.
- The computer configured to run the Hybrid Runbook Worker doesn't meet the minimum hardware requirements.
Resolution
Verify that the computer has outbound access to *.azure-automation.cn on port 443.
Computers running the Hybrid Runbook Worker should meet the minimum hardware requirements before the worker is configured to host this feature. Runbooks and the background process they use might cause the system to be overused and cause runbook job delays or timeouts.
Confirm the computer to run the Hybrid Runbook Worker feature meets the minimum hardware requirements. If it does, monitor CPU and memory use to determine any correlation between the performance of Hybrid Runbook Worker processes and Windows. Any memory or CPU pressure can indicate the need to upgrade resources. You can also select a different compute resource that supports the minimum requirements and scale when workload demands indicate an increase is necessary.
Check the Microsoft-SMA event log for a corresponding event with the description Win32 Process Exited with code [4294967295]
. The cause of this error is that you haven't configured authentication in your runbooks or specified the Run As credentials for the Hybrid Runbook Worker group. Review runbook permissions in Running runbooks on a Hybrid Runbook Worker to confirm that you've correctly configured authentication for your runbooks.
Scenario: No certificate was found in the certificate store on the Hybrid Runbook Worker
Issue
A runbook running on a Hybrid Runbook Worker fails with the following error message:
Connect-AzAccount : No certificate was found in the certificate store with thumbprint 0000000000000000000000000000000000000000
At line:3 char:1
+ Connect-AzAccount -ServicePrincipal -Tenant $Conn.TenantID -Appl ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : CloseError: (:) [Connect-AzAccount],ArgumentException
+ FullyQualifiedErrorId : Microsoft.Azure.Commands.Profile.ConnectAzAccountCommand
Cause
This error occurs when you attempt to use a Run As account in a runbook that runs on a Hybrid Runbook Worker where the Run As account certificate isn't present. Hybrid Runbook Workers don't have the certificate asset locally by default. The Run As account requires this asset to operate properly.
Resolution
If your Hybrid Runbook Worker is an Azure VM, you can use runbook authentication with managed identities instead. This scenario simplifies authentication by allowing you to authenticate to Azure resources using the managed identity of the Azure VM instead of the Run As account. When the Hybrid Runbook Worker is an on-premises machine, you need to install the Run As account certificate on the machine. To learn how to install the certificate, see the steps to run the PowerShell runbook Export-RunAsCertificateToHybridWorker in Run runbooks on a Hybrid Runbook Worker.
Scenario: Azure VMs automatically dropped from a hybrid worker group
Issue
You can't see the Hybrid Runbook Worker or VMs when the worker machine has been turned off for a long time.
Cause
The Hybrid Runbook Worker machine hasn't pinged Azure Automation for more than 30 days. As a result, Automation has purged the Hybrid Runbook Worker group or the System Worker group.
Resolution
Start the worker machine, and then re-register it with Azure Automation. For instructions on how to install the runbook environment and connect to Azure Automation, see Deploy a Windows Hybrid Runbook Worker.
Next steps
If you don't see your problem here or you can't resolve your issue, contact support.