排查 Runbook 错误Troubleshoot runbook errors

本文描述可能会发生的各种 Runbook 错误及其解决方法。This article describes various runbook errors that might occur and how to resolve them.

诊断 Runbook 问题Diagnose runbook issues

如果在 Azure 自动化中执行 Runbook 期间遇到错误,可使用以下步骤来帮助诊断问题:When you receive errors during runbook execution in Azure Automation, you can use the following steps to help diagnose the issues:

  1. 确保 Runbook 脚本在本地计算机上执行成功。Ensure that your runbook script executes successfully on your local machine.

    有关语言参考和学习模块,请参阅 PowerShell 文档Python 文档。在本地运行脚本即可发现并解决常见错误,例如:For language reference and learning modules, see the PowerShell Docs or Python Docs. Running your script locally can discover and resolve common errors, such as:

    • 缺少模块Missing modules
    • 语法错误Syntax errors
    • 逻辑错误Logic errors
  2. 调查 Runbook 错误流Investigate runbook error streams.

    查看这些流中的特定消息,并将其与本文中所述的错误进行比较。Look at these streams for specific messages, and compare them to the errors documented in this article.

  3. 如果 Runbook 暂停或意外失败,请执行以下操作:If your runbook is suspended or unexpectedly fails:

  4. 如果混合 Runbook 辅助角色中的 Runbook 作业或环境无响应,请执行此步骤。Do this step if the runbook job or the environment on Hybrid Runbook Worker doesn't respond.

    如果在混合 Runbook 辅助角色而不是 Azure 自动化中运行 Runbook,可能需要排查混合辅助角色本身的问题If you're running your runbooks on a Hybrid Runbook Worker instead of in Azure Automation, you might need to troubleshoot the hybrid worker itself.

场景:Runbook 失败并出现“无权限”或“禁止 403”错误Scenario: Runbook fails with a No permission or Forbidden 403 error

问题Issue

Runbook 失败并出现“无权限”或“禁止 403”错误或者类似的错误。Your runbook fails with a No permission or Forbidden 403 error, or equivalent.

原因Cause

运行方式帐户对 Azure 资源的权限可能不同于当前的自动化帐户。Run As accounts might not have the same permissions against Azure resources as your current Automation account.

解决方法Resolution

确保运行方式帐户有权访问脚本中使用的任何资源。Ensure that your Run As account has permissions to access any resources used in your script.

场景:登录到 Azure 帐户失败Scenario: Sign-in to Azure account failed

问题Issue

使用 Connect-AzAccount cmdlet 时遇到以下错误之一:You receive one of the following errors when you work with the Connect-AzAccount cmdlet:

Unknown_user_type: Unknown User Type
No certificate was found in the certificate store with thumbprint

原因Cause

如果凭据资产名称无效,将发生这些错误。These errors occur if the credential asset name isn't valid. 如果用于设置自动化凭据资产的用户名和密码无效,也可能会发生这些错误。They might also occur if the user name and password that you used to set up the Automation credential asset aren't valid.

解决方法Resolution

若要确定问题所在,请执行以下步骤:To determine what's wrong, follow these steps:

  1. 请确保没有包含任何特殊字符。Make sure that you don't have any special characters. 这些字符包括用于连接 Azure 的自动化凭据资产名称中的 \@ 字符。These characters include the \@ character in the Automation credential asset name that you're using to connect to Azure.

  2. 检查是否能够在本地 PowerShell ISE 编辑器中使用存储在 Azure 自动化凭据中的用户名和密码。Check to see if you can use the user name and password that are stored in the Azure Automation credential in your local PowerShell ISE editor. 在 PowerShell ISE 中运行以下 cmdlet。Run the following cmdlets in the PowerShell ISE.

    $Cred = Get-Credential
    #Using Azure Service Management
    Add-AzureAccount -Environment AzureChinaCloud -Credential $Cred  
    #Using Azure Resource Manager
    Connect-AzAccount -EnvironmentName AzureChinaCloud -Credential $Cred
    
  3. 如果无法在本地进行身份验证,则表示尚未正确设置 Azure Active Directory (Azure AD) 凭据。If your authentication fails locally, you haven't set up your Azure Active Directory (Azure AD) credentials properly. 若要正确设置 Azure AD 帐户,请参阅博客文章:使用 Azure Active Directory 在 Azure 中进行身份验证To get the Azure AD account set up correctly, see the blog post Authenticating to Azure using Azure Active Directory.

  4. 如果该错误看上去是暂时性的,请尝试向身份验证例程添加重试逻辑,使身份验证更加可靠。If the error appears to be transient, try adding retry logic to your authentication routine to make authenticating more robust.

    # Get the connection "AzureRunAsConnection"
    $connectionName = "AzureRunAsConnection"
    $servicePrincipalConnection = Get-AutomationConnection -Name $connectionName
    
    $logonAttempt = 0
    $logonResult = $False
    
    while(!($connectionResult) -And ($logonAttempt -le 10))
    {
        $LogonAttempt++
        #Logging in to Azure...
        $connectionResult = Connect-AzureRmAccount `
                               -EnvironmentName AzureChinaCloud `
                               -ServicePrincipal `
                               -TenantId $servicePrincipalConnection.TenantId `
                               -ApplicationId $servicePrincipalConnection.ApplicationId `
                               -CertificateThumbprint $servicePrincipalConnection.CertificateThumbprint
    
        Start-Sleep -Seconds 30
    }
    

场景:无法找到 Azure 订阅Scenario: Unable to find the Azure subscription

问题Issue

使用 Select-AzureSubscriptionSelect-AzureRMSubscription cmdlet 时遇到以下错误:You receive the following error when you work with the Select-AzureSubscription, or Select-AzureRMSubscription cmdlet:

The subscription named <subscription name> cannot be found.

错误Error

如果出现以下情况,则可能会发生此错误:This error can occur if:

  • 订阅名称无效。The subscription name isn't valid.
  • 尝试获取订阅详细信息的 Azure AD 用户未配置为订阅的管理员。The Azure AD user who's trying to get the subscription details isn't configured as an administrator of the subscription.
  • cmdlet 不可用。The cmdlet isn't available.

解决方法Resolution

按以下步骤确定你是否已在 Azure 中完成身份验证并有权访问你尝试选择的订阅:Follow these steps to determine if you've authenticated to Azure and have access to the subscription that you're trying to select:

  1. 为确保脚本可单独正常运行,请在 Azure 自动化外部对其进行测试。To make sure that your script works standalone, test it outside of Azure Automation.

  2. 确保脚本先运行 Connect-AzureRmAccount cmdlet,再运行 Select-* cmdlet。Make sure that your script runs the Connect-AzureRmAccount cmdlet before running the Select-* cmdlet.

  3. 如果仍看到该错误消息,请通过为 Connect-AzureRmAccount 添加 AzContext 参数来修改代码,然后执行代码。If you still see the error message, modify your code by adding the AzContext parameter for Connect-AzureRmAccount, and then execute the code.

    $Conn = Get-AutomationConnection -Name AzureRunAsConnection
    Connect-AzureRmAccount -ServicePrincipal -Tenant $Conn.TenantID -ApplicationId $Conn.ApplicationID -CertificateThumbprint $Conn.CertificateThumbprint
    
    $context = Get-AzureRmContext
    
    Get-AzureRmVM -ResourceGroupName myResourceGroup -AzContext $context
    

场景:无法在 Azure 中进行身份验证,因为已启用多重身份验证Scenario: Authentication to Azure fails because multifactor authentication is enabled

问题Issue

使用 Azure 用户名和密码在 Azure 中进行身份验证时遇到以下错误:You receive the following error when authenticating to Azure with your Azure user name and password:

Add-AzureAccount: AADSTS50079: Strong authentication enrollment (proof-up) is required

原因Cause

如果对 Azure 帐户设置了多重身份验证,则不能使用 Azure Active Directory 用户在 Azure 中进行身份验证。If you have multifactor authentication on your Azure account, you can't use an Azure Active Directory user to authenticate to Azure. 需要使用证书或服务主体进行身份验证。Instead, you need to use a certificate or a service principal to authenticate.

解决方法Resolution

若要在 Azure 经典部署模型 cmdlet 中使用证书,请参阅创建并添加用于管理 Azure 服务的证书To use a certificate with Azure classic deployment model cmdlets, see Creating and adding a certificate to manage Azure services. 若要在 Azure 资源管理器 cmdlet 中使用服务主体,请参阅使用 Azure 门户创建服务主体使用 Azure 资源管理器对服务主体进行身份验证To use a service principal with Azure Resource Manager cmdlets, see Creating service principal using Azure portal and Authenticating a service principal with Azure Resource Manager.

场景:对象引用未设置为某个对象的实例Scenario: Object reference not set to an instance of an object

问题Issue

结合 Wait 参数调用子 Runbook 并且输出流包含对象时遇到以下错误:You receive the following error when you invoke a child runbook with the Wait parameter and the Output stream contains an object:

Object reference not set to an instance of an object

原因Cause

如果流包含对象,Start-AzureRmAutomationRunbook 不会正确处理输出流。If the stream contains objects, Start-AzureRmAutomationRunbook doesn't handle the Output stream correctly.

解决方法Resolution

实现轮询逻辑,并使用 Get-AzureRmAutomationJobOutput cmdlet 检索输出。Implement a polling logic, and use the Get-AzureRmAutomationJobOutput cmdlet to retrieve the output. 下面定义了此逻辑的示例:A sample of this logic is defined here:

$automationAccountName = "ContosoAutomationAccount"
$runbookName = "ChildRunbookExample"
$resourceGroupName = "ContosoRG"

function IsJobTerminalState([string] $status) {
    return $status -eq "Completed" -or $status -eq "Failed" -or $status -eq "Stopped" -or $status -eq "Suspended"
}

$job = Start-AzureRmAutomationRunbook -AutomationAccountName $automationAccountName -Name $runbookName -ResourceGroupName $resourceGroupName
$pollingSeconds = 5
$maxTimeout = 10800
$waitTime = 0
while((IsJobTerminalState $job.Status) -eq $false -and $waitTime -lt $maxTimeout) {
   Start-Sleep -Seconds $pollingSeconds
   $waitTime += $pollingSeconds
   $job = $job | Get-AzureRmAutomationJob
}

$jobResults | Get-AzureRmAutomationJobOutput | Get-AzureRmAutomationJobOutputRecord | Select-Object -ExpandProperty Value

场景:Runbook 因反序列化的对象而失败Scenario: Runbook fails because of deserialized object

问题Issue

Runbook 失败并显示错误:Your runbook fails with the error:

Cannot bind parameter <ParameterName>.

Cannot convert the <ParameterType> value of type Deserialized <ParameterType> to type <ParameterType>.

原因Cause

如果 runbook 为 PowerShell 工作流,它会将复杂对象以反序列化格式进行存储,以便在工作流暂停的情况下保留 runbook 状态。If your runbook is a PowerShell Workflow, it stores complex objects in a deserialized format to persist your runbook state if the workflow is suspended.

解决方法Resolution

使用以下任一解决方法来解决此问题:Use any of the following solutions to fix this problem:

  • 如果将复杂对象从一个 cmdlet 传送到另一个 cmdlet,请将这些 cmdlet 包装在 InlineScript 活动中。If you're piping complex objects from one cmdlet to another, wrap these cmdlets in an InlineScript activity.
  • 传递复杂对象中你所需要的名称或值,不必传递整个对象。Pass the name or value that you need from the complex object instead of passing the entire object.
  • 使用 PowerShell Runbook,而不使用 PowerShell 工作流 Runbook。Use a PowerShell runbook instead of a PowerShell Workflow runbook.

场景:调用 Webhook 时显示“400 错误的请求”状态Scenario: 400 Bad Request status when calling a webhook

问题Issue

在尝试调用 Azure 自动化 Runbook 的 Webhook 时收到以下错误:When you try to invoke a webhook for an Azure Automation runbook, you receive the following error:

400 Bad Request : This webhook has expired or is disabled

原因Cause

尝试调用的 Webhook 已禁用,或者已过期。The webhook that you're trying to call is either disabled or is expired.

解决方法Resolution

如果 Webhook 处于禁用状态,可以通过 Azure 门户重新启用 Webhook。If the webhook is disabled, you can reenable it through the Azure portal. 如果 Webhook 已过期,必须将其删除,然后重新创建。If the webhook has expired, you must delete and then re-create it. 如果尚未过期,只能续订 WebhookYou can only renew a webhook if it hasn't already expired.

场景:429:当前的请求速率过大Scenario: 429: The request rate is currently too large

问题Issue

在运行 Get-AzureRmAutomationJobOutput cmdlet 时收到以下错误消息:You receive the following error message when running the Get-AzureRmAutomationJobOutput cmdlet:

429: The request rate is currently too large. Please try again

原因Cause

从包含多个详细流的 Runbook 中检索作业输出时,可能会发生此错误。This error can occur when retrieving job output from a runbook that has many verbose streams.

解决方法Resolution

执行以下操作之一来解决此错误:Do one of the following to resolve this error:

  • 编辑 Runbook,并减少它发出的作业流数量。Edit the runbook, and reduce the number of job streams that it emits.
  • 减少运行 cmdlet 时要检索的流数量。Reduce the number of streams to be retrieved when running the cmdlet. 为此,可以设置 Get-AzureRmAutomationJobOutput cmdlet 的 Stream 参数值,以仅检索输出流。To do this, you can set the value of the Stream parameter for the Get-AzureRmAutomationJobOutput cmdlet to retrieve only Output streams.

场景:Runbook 作业因超过了分配的配额而失败Scenario: Runbook job fails because allocated quota was exceeded

问题Issue

Runbook 作业失败并显示错误:Your runbook job fails with the error:

The quota for the monthly total job run time has been reached for this subscription

场景:已尝试启动 Runbook 作业三次,但每次都失败Scenario: Runbook job start attempted three times, but fails to start each time

问题Issue

Runbook 失败并出现以下错误:Your runbook fails with the following error:

The job was tried three times but it failed

原因Cause

此错误是由以下问题之一造成的:This error occurs because of one of the following issues:

  • 内存限制。Memory limit. 如果作业使用的内存超过 400 MB,则它可能会失败。A job might fail if it's using more than 400 MB of memory. 自动化服务限制中阐述了分配给沙盒的内存限制。The documented limits on memory allocated to a sandbox are found at Automation service limits.
  • 网络套接字。Network sockets. Azure 沙盒限制为 1000 个并发网络套接字。Azure sandboxes are limited to 1,000 concurrent network sockets. 有关详细信息,请参阅自动化服务限制For more information, see Automation service limits.
  • 未为沙盒设置 Active Directory 身份验证。No authentication with Active Directory for sandbox. Runbook 尝试调用在 Azure 沙盒中运行的可执行文件或子过程。Your runbook attempted to call an executable or subprocess that runs in an Azure sandbox. 不支持将 Runbook 配置为使用 Azure Active Directory 身份验证库 (ADAL) 对 Azure AD 进行身份验证。Configuring runbooks to authenticate with Azure AD by using the Azure Active Directory Authentication Library (ADAL) isn't supported.
  • 异常数据过多。Too much exception data. runbook 尝试向输出流写入太多异常数据。Your runbook attempted to write too much exception data to the output stream.

解决方法Resolution

  • 内存限制,网络套接字。Memory limit, network sockets. 在内存限制内工作的建议方法是在多个 Runbook 之间拆分工作负荷,在内存中处理更少的数据,避免从 Runbook 写入不必要的输出,并考虑将多少个检查点写入 PowerShell 工作流 Runbook。Suggested ways to work within the memory limits are to split the workload among multiple runbooks, process less data in memory, avoid writing unnecessary output from your runbooks, and consider how many checkpoints are written into your PowerShell workflow runbooks. 可以使用 clear 方法(例如 $myVar.clear)清除变量并使用 [GC]::Collect 立即运行垃圾回收。Use the clear method, such as $myVar.clear, to clear out variables and use [GC]::Collect to run garbage collection immediately. 这将减少运行时期间 runbook 的内存占用情况。These actions reduce the memory footprint of your runbook during runtime.

  • 未为沙盒设置 Active Directory 身份验证。No authentication with Active Directory for sandbox. 使用 Runbook 向 Azure AD 进行身份验证时,请确保 Azure AD 模块在自动化帐户中可用。When you authenticate to Azure AD with a runbook, ensure that the Azure AD module is available in your Automation account. 确保为运行方式帐户授予所需的权限,使其能够执行 Runbook 自动执行的任务。Be sure to grant the Run As account the necessary permissions to perform the tasks that the runbook automates.

    如果 Runbook 无法调用 Azure 沙盒中运行的可执行文件或子进程,请在混合 Runbook 辅助角色中使用 Runbook。If your runbook can't call an executable or subprocess running in an Azure sandbox, use the runbook on a Hybrid Runbook Worker. 混合辅助角色不受内存和网络限制,而 Azure 沙盒则受限于此限制。Hybrid workers aren't limited by the memory and network limits that Azure sandboxes have.

  • 异常数据过多。Too much exception data. 作业输出流限制为 1-MB。There's a 1-MB limit on the job output stream. 确保 Runbook 使用 trycatch 块,这样就会包含对可执行文件或子进程的调用。Ensure that your runbook encloses calls to an executable or subprocess by using try and catch blocks. 如果操作引发异常,请让代码将该异常中的消息写入自动化变量中。If the operations throw an exception, have the code write the message from the exception into an Automation variable. 此方法可防止将消息写入作业输出流中。This technique prevents the message from being written into the job output stream.

场景:PowerShell 作业失败并出现“无法调用方法”错误消息Scenario: PowerShell job fails with "Cannot invoke method" error message

问题Issue

在 Runbook 中启动 PowerShell 作业时,如果该 Runbook 在 Azure 中运行,你会收到以下错误消息:You receive the following error message when you start a PowerShell job in a runbook that runs in Azure:

Exception was thrown - Cannot invoke method. Method invocation is supported only on core types in this language mode.

原因Cause

此错误可能表示在 Azure 沙盒中运行的 Runbook 无法在“完整语言”模式下运行。This error might indicate that runbooks that run in an Azure sandbox can't run in the Full Language mode.

解决方法Resolution

可通过两种方法来解决此错误:There are two ways to resolve this error:

若要详细了解 Azure 自动化 Runbook 的此行为和其他行为,请参阅 Azure 自动化中的 Runbook 执行To learn more about this behavior and other behaviors of Azure Automation runbooks, see Runbook execution in Azure Automation.

场景:长时间运行的 Runbook 无法完成Scenario: A long-running runbook fails to complete

问题Issue

运行三小时后,Runbook 显示处于“已停止”状态。Your runbook shows in a Stopped state after running for three hours. 此外,可能会收到以下错误:You might also receive this error:

The job was evicted and subsequently reached a Stopped state. The job cannot continue running.

此行为是 Azure 沙盒的设计使然,因为 Azure 自动化中会对进程进行公平份额监视。This behavior is by design in Azure sandboxes because of the fair share monitoring of processes within Azure Automation. 如果进程的执行时间超过三小时,“公平份额”会自动停止 Runbook。If a process executes longer than three hours, fair share automatically stops a runbook. 超过公平份额时间限制的 runbook 的状态因 runbook 类型而异。The status of a runbook that goes past the fair share time limit differs by runbook type. PowerShell 和 Python Runbook 设置为“已停止”状态。PowerShell and Python runbooks are set to a Stopped status. PowerShell 工作流 Runbook 设置为“失败”。PowerShell Workflow runbooks are set to Failed.

原因Cause

Runbook 运行时间超出了 Azure 沙盒中公平份额允许的三小时限制。The runbook ran over the three-hour limit allowed by fair share in an Azure sandbox.

解决方法Resolution

建议的解决方案是在混合 Runbook 辅助角色上运行 runbook。One recommended solution is to run the runbook on a Hybrid Runbook Worker. 混合辅助角色不受 Azure 沙盒的三小时公平份额 Runbook 限制。Hybrid workers aren't limited by the three-hour fair share runbook limit that Azure sandboxes have. 应开发在混合 Runbook 辅助角色上运行的 Runbook,以便在出现意外的本地基础结构问题时支持重启行为。Runbooks that run on Hybrid Runbook Workers should be developed to support restart behaviors if there are unexpected local infrastructure issues.

另一种解决方法是通过创建子 Runbook 来优化 Runbook。Another solution is to optimize the runbook by creating child runbooks. 如果 Runbook 在多个资源上循环访问同一函数(例如,对多个数据库上执行某个数据库操作),可将该函数移到子 Runbook。If your runbook loops through the same function on several resources, for example, in a database operation on several databases, you can move the function to a child runbook. 每个子 Runbook 在单独的进程中并行执行。Each child runbook executes in parallel in a separate process. 此行为降低了完成父 runbook 所需的时间总量。This behavior decreases the total amount of time for the parent runbook to complete.

启用子 runbook 方案的 PowerShell cmdlet 是:The PowerShell cmdlets that enable the child runbook scenario are:

场景:对 Runbook 或应用程序使用 Azure 沙盒时出现“拒绝访问”Scenario: Access denied when using Azure sandbox for runbook or application

问题Issue

当 Runbook 或应用程序尝试在 Azure 沙盒中运行时,环境拒绝其访问。When your runbook or application attempts to run in an Azure sandbox, the environment denies access.

原因Cause

之所以会出现此问题,是因为 Azure 沙盒不允许访问所有的进程外 COM 服务器。This issue can occur because Azure sandboxes prevent access to all out-of-process COM servers. 例如,沙盒应用程序或 Runbook 无法调用 Windows Management Instrumentation (WMI) 或 Windows Installer 服务 (msiserver.exe)。For example, a sandboxed application or runbook can't call into Windows Management Instrumentation (WMI) or into the Windows Installer service (msiserver.exe).

解决方法Resolution

若要详细了解如何使用 Azure 沙盒,请参阅 Runbook 执行环境For details about the use of Azure sandboxes, see Runbook execution environment.

方案:在 Runbook 中使用 Key Vault 时出现“无效且被禁止”状态代码Scenario: Invalid Forbidden status code when using Key Vault inside a runbook

问题Issue

尝试通过 Azure 自动化 Runbook 访问 Azure Key Vault 时遇到以下错误:When you try to access Azure Key Vault through an Azure Automation runbook, you get the following error:

Operation returned an invalid status code 'Forbidden'

原因Cause

此问题的可能原因包括:Possible causes for this issue are:

  • 未使用运行方式帐户。Not using a Run As account.
  • 权限不足。Insufficient permissions.

解决方法Resolution

未使用运行方式帐户Not using a Run As account

步骤 5 - 添加身份验证来管理 Azure 资源的要求操作,以确保使用运行方式帐户访问 Key Vault。Follow Step 5 - Add authentication to manage Azure resources to ensure that you are using a Run As account to access Key Vault.

权限不足Insufficient permissions

将权限添加到 Key Vault,以确保运行方式帐户拥有足够的权限,可以访问 Key Vault。Add permissions to Key Vault to ensure that your Run As account has sufficient permissions to access Key Vault.

后续步骤Next steps

如果你的问题未在本文中列出,或者你无法解决自己的问题,请尝试通过以下渠道之一获取更多支持:If you didn't see your problem here or you're unable to solve your issue, try one of the following channels for more support: