排查混合 Runbook 辅助角色问题Troubleshoot Hybrid Runbook Worker issues

本文介绍如何排查和解决 Azure 自动化混合 Runbook 辅助角色的问题。This article provides information on troubleshooting and resolving issues with Azure Automation Hybrid Runbook Workers. 如需常规信息,请参阅混合 Runbook 辅助角色概述For general information, see Hybrid Runbook Worker overview.

常规General

混合 Runbook 辅助角色依靠代理与 Azure 自动化帐户通信,以注册辅助角色、接收 Runbook 作业和报告状态。The Hybrid Runbook Worker depends on an agent to communicate with your Azure Automation account to register the worker, receive runbook jobs, and report status. 对于 Windows,此代理是适用于 Windows 的 Log Analytics 代理。For Windows, this agent is the Log Analytics agent for Windows. 对于 Linux,此代理是适用于 Linux 的 Log Analytics 代理。For Linux, it's the Log Analytics agent for Linux.

场景:Runbook 执行失败Scenario: Runbook execution fails

问题Issue

Runbook 执行失败并出现以下错误消息:Runbook execution fails, and you receive the following error message:

"The job action 'Activate' cannot be run, because the process stopped unexpectedly. The job action was attempted three times."

Runbook 在尝试执行三次后很快暂停。Your runbook is suspended shortly after it attempts to execute three times. 在某些情况下,Runbook 可能会中断,无法正常完成。There are conditions that can interrupt the runbook from completing. 相关错误消息可能不包括任何附加信息。The related error message might not include any additional information.

原因Cause

下面是可能的原因:The following are possible causes:

  • Runbook 无法使用本地资源进行身份验证。The runbooks can't authenticate with local resources.
  • 混合辅助角色在代理或防火墙后面。The hybrid worker is behind a proxy or firewall.
  • 配置为运行混合 Runbook 辅助角色的计算机不满足最低硬件要求。The computer configured to run the Hybrid Runbook Worker doesn't meet the minimum hardware requirements.

解决方法Resolution

确保计算机在端口 443 上对 * .azure-automation.cn 有出站访问权限。Verify that the computer has outbound access to *.azure-automation.cn on port 443.

运行混合 Runbook 辅助角色的计算机应满足最低硬件要求,然后才能将辅助角色配置为托管此功能。Computers running the Hybrid Runbook Worker should meet the minimum hardware requirements before the worker is configured to host this feature. 它们使用的 Runbook 和后台进程可能会导致系统过度使用,造成 Runbook 作业延迟或超时。Runbooks and the background process they use might cause the system to be overused and cause runbook job delays or timeouts.

确认将要运行混合 Runbook 辅助角色功能的计算机满足最低硬件要求。Confirm that the computer to run the Hybrid Runbook Worker feature meets the minimum hardware requirements. 如果满足,请监视 CPU 和内存使用,以确定混合 Runbook 辅助角色进程的性能和 Windows 之间的任何关联。If it does, monitor CPU and memory use to determine any correlation between the performance of Hybrid Runbook Worker processes and Windows. 出现内存或 CPU 压力可能意味着需要升级资源。Any memory or CPU pressure can indicate the need to upgrade resources. 也可以选择其他支持最低要求的计算资源,并在工作负荷需求指示需要增加时进行扩展。You can also select a different compute resource that supports the minimum requirements and scale when workload demands indicate that an increase is necessary.

Microsoft-SMA 事件日志中检查附带“Win32 Process Exited with code [4294967295]”说明的相应事件。Check the Microsoft-SMA event log for a corresponding event with the description Win32 Process Exited with code [4294967295]. 此错误的原因是尚未在 Runbook 中配置身份验证,或者未为混合 Runbook 辅助角色组指定运行方式凭据。The cause of this error is that you haven't configured authentication in your runbooks or specified the Run As credentials for the Hybrid Runbook Worker group. 在混合 Runbook 辅助角色上运行 Runbook 中查看 Runbook 权限,确认是否已正确配置 Runbook 的身份验证。Review runbook permissions in Running runbooks on a Hybrid Runbook Worker to confirm that you've correctly configured authentication for your runbooks.

场景:混合 Runbook 辅助角色中发生事件 15011Scenario: Event 15011 in the Hybrid Runbook Worker

问题Issue

混合 Runbook 辅助角色收到表示查询结果无效的事件 15011。The Hybrid Runbook Worker receives event 15011, indicating that a query result isn't valid. 当辅助角色尝试与 SignalR 服务器建立连接时出现以下错误。The following error appears when the worker attempts to open a connection with the SignalR server.

[AccountId={c7d22bd3-47b2-4144-bf88-97940102f6ca}]
[Uri=https://cc-jobruntimedata-prod-su1.azure-automation.cn/notifications/hub][Exception=System.TimeoutException: Transport timed out trying to connect
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at JobRuntimeData.NotificationsClient.JobRuntimeDataServiceSignalRClient.<Start>d__45.MoveNext()

原因Cause

未为自动化功能部署(例如更新管理)正确配置混合 Runbook 辅助角色。The Hybrid Runbook Worker hasn't been configured correctly for the automated feature deployment, for example, for Update Management. 部署包含的某个部件会将 VM 连接到 Log Analytics 工作区。The deployment contains a part that connects the VM to the Log Analytics workspace. PowerShell 脚本将在订阅中查找具有所提供名称的工作区。The PowerShell script looks for the workspace in the subscription with the supplied name. 在此示例中,该 Log Analytics 工作区位于其他订阅中。In this case, the Log Analytics workspace is in a different subscription. 脚本找不到该工作区,因此尝试创建一个工作区,但该名称已被占用。The script can't find the workspace and tries to create one, but the name is already taken. 因此,部署失败。As a result, the deployment fails.

解决方法Resolution

可以采用两种做法来解决此问题:You have two options for resolving this issue:

  • 修改 PowerShell 脚本,以在另一个订阅中查找 Log Analytics 工作区。Modify the PowerShell script to look for the Log Analytics workspace in another subscription. 如果你将来打算部署许多混合 Runbook 辅助角色计算机,则非常适合使用此解决方法。This is a good resolution to use if you plan to deploy many Hybrid Runbook Worker machines in the future.

  • 手动将辅助角色计算机配置为在业务流程协调程序沙盒中运行。Manually configure the worker machine to run in an Orchestrator sandbox. 然后在该辅助角色上运行在 Azure 自动化帐户中创建的 Runbook,以测试功能。Then run a runbook created in the Azure Automation account on the worker to test the functionality.

场景:从混合辅助角色组中自动删除 Windows Azure VMScenario: Windows Azure VMs automatically dropped from a hybrid worker group

问题Issue

当辅助角色计算机长期关闭时,看不到混合 Runbook 辅助角色或 VM。You can't see the Hybrid Runbook Worker or VMs when the worker machine has been turned off for a long time.

原因Cause

混合 Runbook 辅助角色计算机有 30 天以上无法 ping 通 Azure 自动化。The Hybrid Runbook Worker machine hasn't pinged Azure Automation for more than 30 days. 因此,自动化清除了混合 Runbook 辅助角色组或系统辅助角色组。As a result, Automation has purged the Hybrid Runbook Worker group or the System Worker group.

解决方法Resolution

启动辅助角色计算机,然后将其重新注册到 Azure 自动化。Start the worker machine, and then rereregister it with Azure Automation. 有关如何安装 Runbook 环境和连接到 Azure 自动化的说明,请参阅部署 Windows 混合 Runbook 辅助角色For instructions on how to install the runbook environment and connect to Azure Automation, see Deploy a Windows Hybrid Runbook Worker.

场景:在混合 Runbook 辅助角色上的证书存储中找不到证书Scenario: No certificate was found in the certificate store on the Hybrid Runbook Worker

问题Issue

混合 Runbook 辅助角色上运行的 runbook 失败并显示以下错误消息:A runbook running on a Hybrid Runbook Worker fails with the following error message:

Connect-AzAccount : No certificate was found in the certificate store with thumbprint 0000000000000000000000000000000000000000
At line:3 char:1
+ Connect-AzAccount -ServicePrincipal -Tenant $Conn.TenantID -Appl ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : CloseError: (:) [Connect-AzAccount], ArgumentException
    + FullyQualifiedErrorId : Microsoft.Azure.Commands.Profile.ConnectAzAccountCommand

原因Cause

尝试在混合 Runbook 辅助角色上运行的 Runbook 中使用运行方式帐户时,如果运行方式帐户证书不存在,则会发生此错误。This error occurs when you attempt to use a Run As account in a runbook that runs on a Hybrid Runbook Worker where the Run As account certificate isn't present. 默认情况下,混合 Runbook 辅助角色在本地没有证书资产。Hybrid Runbook Workers don't have the certificate asset locally by default. 运行方式帐户需要此资产才能正常运行。The Run As account requires this asset to operate properly.

解决方法Resolution

如果混合 Runbook 辅助角色是 Azure VM,则可改用使用托管标识的 Runbook 身份验证If your Hybrid Runbook Worker is an Azure VM, you can use runbook authentication with managed identities instead. 此方案允许使用 Azure VM 的托管标识而非运行方式帐户向 Azure 资源进行身份验证,从而简化了身份验证。This scenario simplifies authentication by allowing you to authenticate to Azure resources using the managed identity of the Azure VM instead of the Run As account. 如果混合 Runbook 辅助角色是本地计算机,需要在此计算机上安装运行方式帐户证书。When the Hybrid Runbook Worker is an on-premises machine, you need to install the Run As account certificate on the machine. 若要了解如何安装证书,请参阅在混合 Runbook 辅助角色上运行 Runbook 中有关如何运行 PowerShell Runbook Export-RunAsCertificateToHybridWorker 的步骤。To learn how to install the certificate, see the steps to run the PowerShell runbook Export-RunAsCertificateToHybridWorker in Run runbooks on a Hybrid Runbook Worker.

场景:在注册混合 Runbook 辅助角色期间发生错误 403Scenario: Error 403 during registration of a Hybrid Runbook Worker

问题Issue

辅助角色的初始注册阶段失败并出现以下错误 (403):The worker's initial registration phase fails, and you receive the following error (403):

"Forbidden: You don't have permission to access / on this server."

原因Cause

以下问题是可能的原因:The following issues are possible causes:

  • 在代理设置中错误键入了工作区 ID 或工作区密钥(主密钥)。There's a mistyped workspace ID or workspace key (primary) in the agent's settings.
  • 混合 Runbook 辅助角色无法下载配置,导致帐户链接错误。The Hybrid Runbook Worker can't download the configuration, which causes an account linking error. 当 Azure 在计算机上启用功能时,它仅支持特定的区域链接 Log Analytics 工作区和自动化帐户。When Azure enables features on machines, it supports only certain regions for linking a Log Analytics workspace and an Automation account. 此外,还可能在计算机上设置了错误的日期或时间。It's also possible that an incorrect date or time is set on the computer. 如果时间比当前时间快/慢 15 分钟,则功能部署会失败。If the time is +/- 15 minutes from the current time, feature deployment fails.

解决方法Resolution

错误键入了工作区 ID 或密钥Mistyped workspace ID or key

若要验证是否错误键入了代理的工作区 ID 或工作区密钥,请参阅添加或删除工作区 – Windows 代理(适用于 Windows 代理)或者添加或删除工作区 – Linux 代理(适用于 Linux 代理)。To verify if the agent's workspace ID or workspace key was mistyped, see Adding or removing a workspace – Windows agent for the Windows agent or Adding or removing a workspace – Linux agent for the Linux agent. 确保从 Azure 门户中选择完整字符串,然后小心复制并粘贴该字符串。Make sure to select the full string from the Azure portal, and copy and paste it carefully.

未下载配置Configuration not downloaded

Log Analytics 工作区和自动化帐户必须位于链接的区域中。Your Log Analytics workspace and Automation account must be in a linked region. 有关支持的区域列表,请参阅 Azure 自动化和 Log Analytics 工作区映射For a list of supported regions, see Azure Automation and Log Analytics workspace mappings.

可能还需要更新计算机的日期或时区。You might also need to update the date or time zone of your computer. 如果选择自定义时间范围,请确保该范围采用 UTC,它可能与你的本地时区不同。If you select a custom time range, make sure that the range is in UTC, which can differ from your local time zone.

LinuxLinux

Linux 混合 Runbook 辅助角色依靠适用于 Linux 的 Log Analytics 代理与自动化帐户通信,以注册辅助角色、接收 Runbook 作业和报告状态。The Linux Hybrid Runbook Worker depends on the Log Analytics agent for Linux to communicate with your Automation account to register the worker, receive runbook jobs, and report status. 如果辅助角色注册失败,以下是一些可能导致此错误的原因。If registration of the worker fails, here are some possible causes for the error.

场景:Linux 混合 Runbook 辅助角色在为 Runbook 签名时收到密码提示Scenario: Linux Hybrid Runbook Worker receives prompt for a password when signing a runbook

问题Issue

针对 Linux 混合 Runbook 辅助角色运行 sudo 命令时检索到意外的密码提示。Running the sudo command for a Linux Hybrid Runbook Worker retrieves an unexpected prompt for a password.

原因Cause

未在 sudoers 文件中正确配置适用于 Linux 的 Log Analytics 代理的 nxautomationuser 帐户。The nxautomationuser account for the Log Analytics agent for Linux is not correctly configured in the sudoers file. 需要为混合 Runbook 辅助角色适当配置帐户权限和其他数据,才能让它为 Linux Runbook 辅助角色中的 Runbook 签名。The Hybrid Runbook Worker needs the appropriate configuration of account permissions and other data so that it can sign runbooks on the Linux Runbook Worker.

解决方法Resolution

场景:适用于 Linux 的 Log Analytics 代理未运行Scenario: Log Analytics agent for Linux isn't running

问题Issue

适用于 Linux 的 Log Analytics 代理未运行。The Log Analytics agent for Linux isn't running.

原因Cause

如果该代理未运行,会导致 Linux 混合 Runbook 辅助角色无法与 Azure 自动化通信。If the agent isn't running, it prevents the Linux Hybrid Runbook Worker from communicating with Azure Automation. 该代理可能会出于各种原因而未运行。The agent might not be running for various reasons.

解决方法Resolution

输入命令 ps -ef | grep python 以验证该代理是否正在运行。Verify the agent is running by entering the command ps -ef | grep python. 应该会看到与下面类似的输出。You should see output similar to the following. 使用 nxautomation 用户帐户的 Python 进程。The Python processes with the nxautomation user account. 如果未启用 Azure 自动化功能,则以下任何进程都不会运行。If the Azure Automation feature isn't enabled, none of the following processes are running.

nxautom+   8567      1  0 14:45 ?        00:00:00 python /opt/microsoft/omsconfig/modules/nxOMSAutomationWorker/DSCResources/MSFT_nxOMSAutomationWorkerResource/automationworker/worker/main.py /var/opt/microsoft/omsagent/state/automationworker/oms.conf rworkspace:<workspaceId> <Linux hybrid worker version>
nxautom+   8593      1  0 14:45 ?        00:00:02 python /opt/microsoft/omsconfig/modules/nxOMSAutomationWorker/DSCResources/MSFT_nxOMSAutomationWorkerResource/automationworker/worker/hybridworker.py /var/opt/microsoft/omsagent/state/automationworker/worker.conf managed rworkspace:<workspaceId> rversion:<Linux hybrid worker version>
nxautom+   8595      1  0 14:45 ?        00:00:02 python /opt/microsoft/omsconfig/modules/nxOMSAutomationWorker/DSCResources/MSFT_nxOMSAutomationWorkerResource/automationworker/worker/hybridworker.py /var/opt/microsoft/omsagent/<workspaceId>/state/automationworker/diy/worker.conf managed rworkspace:<workspaceId> rversion:<Linux hybrid worker version>

以下列表显示针对 Linux 混合 Runbook 辅助角色启动的进程。The following list shows the processes that are started for a Linux Hybrid Runbook Worker. 这些进程全部位于 /var/opt/microsoft/omsagent/state/automationworker/ 目录中。They're all located in the /var/opt/microsoft/omsagent/state/automationworker/ directory.

  • oms.conf:辅助角色管理器进程。oms.conf: The worker manager process. 它直接从 DSC 启动。It's started directly from DSC.
  • worker.conf:自动注册的混合辅助角色进程。worker.conf: The Auto-Registered hybrid worker process. 它由辅助角色管理器启动。It's started by the worker manager. 此进程由更新管理使用且对用户而言是透明的。This process is used by Update Management and is transparent to the user. 如果未在计算机上启用更新管理,则不会显示此进程。This process isn't present if Update Management isn't enabled on the machine.
  • diy/worker.conf:DIY 混合辅助角色进程。diy/worker.conf: The DIY hybrid worker process. DIY 混合辅助角色进程用于执行混合 Runbook 辅助角色的用户 Runbook。The DIY hybrid worker process is used to execute user runbooks on the Hybrid Runbook Worker. 它仅与使用不同配置的自动注册混合辅助角色进程在主要细节上有所不同。It only differs from the Auto-registered hybrid worker process in the key detail that it uses a different configuration. 如果禁用 Azure 自动化,并且 DIY Linux 混合辅助角色未注册,则不会显示此进程。This process isn't present if Azure Automation is disabled and the DIY Linux Hybrid Worker isn't registered.

如果该代理未运行,请运行以下命令以启动该服务:sudo /opt/microsoft/omsagent/bin/service_control restartIf the agent isn't running, run the following command to start the service: sudo /opt/microsoft/omsagent/bin/service_control restart.

场景:指定的类不存在Scenario: The specified class doesn't exist

如果在 /var/opt/microsoft/omsconfig/omsconfig.log 中看到错误消息 The specified class does not exist..,则需要更新适用于 Linux 的 Log Analytics 代理。If you see the error message The specified class does not exist.. in /var/opt/microsoft/omsconfig/omsconfig.log, the Log Analytics agent for Linux needs to be updated. 运行以下命令以重新安装该代理。Run the following command to reinstall the agent.

wget https://raw.githubusercontent.com/Microsoft/OMS-Agent-for-Linux/master/installer/scripts/onboard_agent.sh && sh onboard_agent.sh -w <WorkspaceID> -s <WorkspaceKey>

WindowsWindows

Windows 混合 Runbook 辅助角色依靠适用于 Windows 的 Log Analytics 代理与自动化帐户通信,以注册辅助角色、接收 Runbook 作业和报告状态。The Windows Hybrid Runbook Worker depends on the Log Analytics agent for Windows to communicate with your Automation account to register the worker, receive runbook jobs, and report status. 如果辅助角色注册失败,请参考本部分所述的一些可能原因。If registration of the worker fails, this section includes some possible reasons.

场景:适用于 Windows 的 Log Analytics 代理未运行。Scenario: The Log Analytics agent for Windows isn't running

问题Issue

healthservice 未在混合 Runbook 辅助角色计算机上运行。The healthservice isn't running on the Hybrid Runbook Worker machine.

原因Cause

如果适用于 Windows 的 Log Analytics 服务未运行,则混合 Runbook 辅助角色无法与 Azure 自动化通信。If the Log Analytics for Windows service isn't running, the Hybrid Runbook Worker can't communicate with Azure Automation.

解决方法Resolution

在 PowerShell 中输入以下命令,验证代理是否正在运行:Get-Service healthserviceVerify that the agent is running by entering the following command in PowerShell: Get-Service healthservice. 如果该服务已停止,请在 PowerShell 中输入以下命令启动该服务:Start-Service healthserviceIf the service is stopped, enter the following command in PowerShell to start the service: Start-Service healthservice.

场景:混合 Runbook 辅助角色未提供报告Scenario: Hybrid Runbook Worker not reporting

问题Issue

混合 Runbook 辅助角色计算机在运行,但是在工作区中未看到该计算机的检测信号数据。Your Hybrid Runbook Worker machine is running, but you don't see heartbeat data for the machine in the workspace.

以下示例查询显示了工作区中的计算机及其上次检测信号:The following example query shows the machines in a workspace and their last heartbeat:

// Last heartbeat of each computer
Heartbeat
| summarize arg_max(TimeGenerated, *) by Computer

原因Cause

此问题可能是由于混合 Runbook 辅助角色上的高速缓存损坏导致的。This issue can be caused by a corrupt cache on the Hybrid Runbook Worker.

解决方法Resolution

若要解决此问题,请登录到混合 Runbook 辅助角色并运行以下脚本。To resolve this issue, sign in to the Hybrid Runbook Worker and run the following script. 此脚本将停止适用于 Windows 的 Log Analytics 代理,删除其缓存,并重启服务。This script stops the Log Analytics agent for Windows, removes its cache, and restarts the service. 此操作会强制混合 Runbook 辅助角色从 Azure 自动化重新下载其配置。This action forces the Hybrid Runbook Worker to re-download its configuration from Azure Automation.

Stop-Service -Name HealthService

Remove-Item -Path 'C:\Program Files\Microsoft Monitoring Agent\Agent\Health Service State' -Recurse

Start-Service -Name HealthService

场景:无法添加混合 Runbook 辅助角色Scenario: You can't add a Hybrid Runbook Worker

问题Issue

尝试使用 Add-HybridRunbookWorker cmdlet 添加混合 Runbook 辅助角色时收到以下消息:You receive the following message when you try to add a Hybrid Runbook Worker by using the Add-HybridRunbookWorker cmdlet:

Machine is already registered

原因Cause

如果计算机已注册到一个不同的自动化帐户,或者在将混合 Runbook 辅助角色从计算机中删除后尝试重新添加它,则可能会导致此问题。This issue can be caused if the machine is already registered with a different Automation account or if you try to re-add the Hybrid Runbook Worker after removing it from a machine.

解决方法Resolution

若要解决此问题,请删除以下注册表项,重启 HealthService,然后再次尝试运行 Add-HybridRunbookWorker cmdlet。To resolve this issue, remove the following registry key, restart HealthService, and try the Add-HybridRunbookWorker cmdlet again.

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\HybridRunbookWorker

后续步骤Next steps

如果你的问题未在本文中列出,或者无法解决问题,请尝试通过以下渠道之一获取更多支持:If you don't see your problem here or you can't resolve your issue, try one of the following channels for additional support: