排查 Azure 自动化状态配置问题Troubleshoot Azure Automation State Configuration issues

本文提供了有关如何排查和解决在 Azure 自动化状态配置中编译或部署配置时出现的问题的信息。This article provides information on troubleshooting and resolving issues that arise while you compile or deploy configurations in Azure Automation State Configuration. 有关状态配置功能的常规信息,请参阅 Azure 自动化状态配置概述For general information about the State Configuration feature, see Azure Automation State Configuration overview.

诊断问题Diagnose an issue

收到配置的编译或部署错误时,下面列出了帮助你诊断问题的一些步骤。When you receive a compilation or deployment error for configuration, here are a few steps to help you diagnose the issue.

1.确保配置在本地计算机上成功编译1. Ensure that your configuration compiles successfully on the local machine

Azure 自动化状态配置基于 PowerShell 所需状态配置 (DSC) 生成。Azure Automation State Configuration is built on PowerShell Desired State Configuration (DSC). 可以在 PowerShell DSC 文档中找到 DSC 语言和语法的文档。You can find the documentation for the DSC language and syntax in the PowerShell DSC Docs.

通过在本地计算机上编译 DSC 配置,可以发现和解决常见错误,例如:By compiling a DSC configuration on your local machine, you can discover and resolve common errors, such as:

  • 缺少模块。Missing modules.
  • 语法错误。Syntax errors.
  • 逻辑错误。Logic errors.

2.查看节点上的 DSC 日志2. View DSC logs on your node

如果配置成功编译,但在应用于节点时失败,则可以在 DSC 日志中找到详细信息。If your configuration compiles successfully, but fails when applied to a node, you can find detailed information in the DSC logs. 有关在何处查找这些日志的信息,请参阅 DSC 事件日志在哪里For information about where to find these logs, see Where are the DSC Event Logs.

xDscDiagnostics 模块可以帮助你分析 DSC 日志中的详细信息。The xDscDiagnostics module can assist you in parsing detailed information from the DSC logs. 如果联系支持人员,则需要这些日志来诊断问题。If you contact support, they require these logs to diagnose your issue.

可以按照安装稳定版本模块中的说明在本地计算机上安装 xDscDiagnostics 模块。You can install the xDscDiagnostics module on your local machine by following the instructions in Install the stable version module.

若要在 Azure 计算机上安装 xDscDiagnostics 模块,请使用 Invoke-AzVMRunCommandTo install the xDscDiagnostics module on your Azure machine, use Invoke-AzVMRunCommand. 还可以按照使用“运行命令”在 Windows VM 中运行 PowerShell 脚本中的步骤使用 Azure 门户中的“运行命令”选项。You can also use the Run command option in the Azure portal by following the steps in Run PowerShell scripts in your Windows VM with Run Command.

有关使用 xDscDiagnostics 的信息,请参阅使用 xDscDiagnostics 分析 DSC 日志For information on using xDscDiagnostics, see Using xDscDiagnostics to analyze DSC logs. 另请参阅 xDscDiagnostics CmdletSee also xDscDiagnostics Cmdlets.

3.确保节点和自动化工作区具有所需的模块3. Ensure that nodes and the Automation workspace have required modules

DSC 依赖于节点上安装的模块。DSC depends on modules installed on the node. 使用 Azure 自动化状态配置时,按照导入模块中的步骤将所需的所有模块导入到自动化帐户中。When you use Azure Automation State Configuration, import any required modules into your Automation account by following the steps in Import Modules. 配置还可以依赖于特定版本的模块。Configurations can also have a dependency on specific versions of modules. 有关详细信息,请参阅排除模块故障For more information, see Troubleshoot modules.

场景:无法从门户中删除包含特殊字符的配置Scenario: A configuration with special characters can't be deleted from the portal

问题Issue

尝试从门户中删除 DSC 配置时,会看到以下错误:When you attempt to delete a DSC configuration from the portal, you see the following error:

An error occurred while deleting the DSC configuration '<name>'.  Error-details: The argument configurationName with the value <name> is not valid.  Valid configuration names can contain only letters,  numbers, and underscores.  The name must start with a letter.  The length of the name must be between 1 and 64 characters.

原因Cause

此错误是计划解决的暂时性问题。This error is a temporary issue that's planned to be resolved.

解决方法Resolution

使用 [Remove-AzAutomationDscConfiguration](https://docs.microsoft.com/powershell/module/Az.Automation/Remove-AzAutomationDscConfiguration?view=azps-3.7.0 cmdlet 删除配置。Use the [Remove-AzAutomationDscConfiguration](https://docs.microsoft.com/powershell/module/Az.Automation/Remove-AzAutomationDscConfiguration?view=azps-3.7.0 cmdlet to delete the configuration.

场景:未能注册 DSC 代理Scenario: Failed to register the DSC Agent

问题Issue

若为 Set-DscLocalConfigurationManager 或其他 DSC cmdlet,则会收到以下错误:When Set-DscLocalConfigurationManager or another DSC cmdlet, you receive the error:

Registration of the Dsc Agent with the server
https://<location>-agentservice-prod-1.azure-automation.cn/accounts/00000000-0000-0000-0000-000000000000 failed. The
underlying error is: Failed to register Dsc Agent with AgentId 00000000-0000-0000-0000-000000000000 with the server htt
ps://<location>-agentservice-prod-1.azure-automation.cn/accounts/00000000-0000-0000-0000-000000000000/Nodes(AgentId='00000000-0000-0000-0000-000000000000'). .
    + CategoryInfo          : InvalidResult: (root/Microsoft/...gurationManager:String) [], CimException
    + FullyQualifiedErrorId : RegisterDscAgentCommandFailed,Microsoft.PowerShell.DesiredStateConfiguration.Commands.Re
   gisterDscAgentCommand
    + PSComputerName        : <computerName>

原因Cause

此错误通常是由防火墙、位于代理服务器后面的计算机或其他网络错误引起的。This error is normally caused by a firewall, the machine being behind a proxy server, or other network errors.

解决方法Resolution

验证计算机是否有权访问 DSC 的相应终结点,然后重试。Verify that your machine has access to the proper endpoints for DSC and try again. 有关所需端口和地址的列表,请参阅网络规划For a list of ports and addresses needed, see Network planning.

场景:状态报表返回响应代码 UnauthorizedScenario: Status reports return the response code Unauthorized

问题Issue

使用 Azure 自动化状态配置注册节点时,会收到以下错误消息之一:When you register a node with Azure Automation State Configuration, you receive one of the following error messages:

The attempt to send status report to the server https://{your Automation account URL}/accounts/xxxxxxxxxxxxxxxxxxxxxx/Nodes(AgentId='xxxxxxxxxxxxxxxxxxxxxxxxx')/SendReport returned unexpected response code Unauthorized.
VM has reported a failure when processing extension 'Microsoft.Powershell.DSC / Registration of the Dsc Agent with the server failed.

原因Cause

此问题是由证书错误或过期引起的。This issue is caused by a bad or expired certificate. 请参阅重新注册节点See Re-register a node.

此问题也可能是由于代理配置不允许访问 * .azure-automation.cn 而导致的。This issue might also be caused by a proxy configuration not allowing access to *.azure-automation.cn. 有关详细信息,请参阅专用网络的配置For more information, see Configuration of private networks.

解决方法Resolution

使用以下步骤重新注册失败的 DSC 节点。Use the following steps to reregister the failing DSC node.

步骤 1:取消注册节点Step 1: Unregister the node

  1. 在 Azure 门户中,转到“主页” > “自动化帐户”>“(你的自动化帐户)”>“状态配置(DSC)” 。In the Azure portal, go to Home > Automation Accounts > (your Automation account) > State configuration (DSC).
  2. 选择“节点”,然后选择有问题的节点。Select Nodes, and select the node having trouble.
  3. 选择“取消注册”以取消注册节点。Select Unregister to unregister the node.

步骤 2:从节点中卸载 DSC 扩展Step 2: Uninstall the DSC extension from the node

  1. 在 Azure 门户中,转到“主页” > “虚拟机”>“(失败的节点)”>“扩展” 。In the Azure portal, go to Home > Virtual Machine > (failing node) > Extensions.
  2. 选择“Microsoft.Powershell.DSC”,即 PowerShell DSC 扩展。Select Microsoft.Powershell.DSC, the PowerShell DSC extension.
  3. 选择“卸载”以卸载扩展。Select Uninstall to uninstall the extension.

步骤 3:从节点中删除所有错误或过期的证书Step 3: Remove all bad or expired certificates from the node

在提升的 PowerShell 提示符中的失败节点上,运行以下命令:On the failing node from an elevated PowerShell prompt, run these commands:

$certs = @()
$certs += dir cert:\localmachine\my | ?{$_.FriendlyName -like "DSC"}
$certs += dir cert:\localmachine\my | ?{$_.FriendlyName -like "DSC-OaaS Client Authentication"}
$certs += dir cert:\localmachine\CA | ?{$_.subject -like "CN=AzureDSCExtension*"}
"";"== DSC Certificates found: " + $certs.Count
$certs | FL ThumbPrint,FriendlyName,Subject
If (($certs.Count) -gt 0)
{ 
    ForEach ($Cert in $certs) 
    {
        RD -LiteralPath ($Cert.Pspath) 
    }
}

步骤 4:重新注册失败的节点Step 4: Reregister the failing node

  1. 在 Azure 门户中,转到“主页” > “自动化帐户”>“(你的自动化帐户)”>“状态配置(DSC)” 。In the Azure portal, go to Home > Automation Accounts > (your Automation account) > State configuration (DSC).
  2. 选择“节点”。Select Nodes.
  3. 选择 添加Select Add.
  4. 选择失败的节点。Select the failing node.
  5. 选择“连接”,然后选择所需的选项。Select Connect, and select your desired options.

场景:节点处于失败状态,出现“未找到”错误Scenario: Node is in failed status with a "Not found" error

问题Issue

该节点有一个状态为“失败”的报表,其中包含错误:The node has a report with Failed status and contains the error:

The attempt to get the action from server https://<url>//accounts/<account-id>/Nodes(AgentId=<agent-id>)/GetDscAction failed because a valid configuration <guid> cannot be found.

原因Cause

将节点分配到配置名称(例如 ABC)而不是节点配置(MOF 文件)名称(例如 ABC.WebServer)时,通常会发生此错误 。This error typically occurs when the node is assigned to a configuration name, for example, ABC, instead of a node configuration (MOF file) name, for example, ABC.WebServer.

解决方法Resolution

  • 确保要为节点分配节点配置名称,而不是配置名称。Make sure that you're assigning the node with the node configuration name and not the configuration name.

  • 可以使用 Azure 门户或 PowerShell cmdlet 将节点配置分配给节点。You can assign a node configuration to a node by using the Azure portal or with a PowerShell cmdlet.

    • 在 Azure 门户中,转到“主页” > “自动化帐户”>“(你的自动化帐户)”>“状态配置(DSC)” 。In the Azure portal, go to Home > Automation Accounts > (your Automation account) > State configuration (DSC). 然后选择一个节点并选择“分配节点配置”。Then select a node and select Assign node configuration.
    • 使用 Set-AzAutomationDscNode cmdlet。Use the Set-AzAutomationDscNode cmdlet.

场景:编译配置时未生成节点配置(MOF 文件)Scenario: No node configurations (MOF files) were produced when a configuration was compiled

问题Issue

DSC 编译作业暂停,且出现错误:Your DSC compilation job suspends with the error:

Compilation completed successfully, but no node configuration **.mof** files were generated.

原因Cause

如果 DSC 配置中 Node 关键字后面的表达式的计算结果为 $null,则不会生成节点配置。When the expression following the Node keyword in the DSC configuration evaluates to $null, no node configurations are produced.

解决方法Resolution

使用下列解决方案之一来解决此问题:Use one of the following solutions to fix the problem:

  • 确保配置定义中 Node 关键字旁边的表达式的计算结果不为 Null。Make sure that the expression next to the Node keyword in the configuration definition isn't evaluating to Null.
  • 如果要在编译配置时传递 ConfigurationData,请确保从配置数据传递配置需要的值。If you're passing ConfigurationData when you compile the configuration, make sure that you're passing the values that the configuration expects from the configuration data.

场景:DSC 节点报表卡在了“正在进行”状态Scenario: The DSC node report becomes stuck in the In Progress state

问题Issue

DSC 代理输出:The DSC agent outputs:

No instance found with given property values

原因Cause

已升级 Windows Management Framework (WMF) 版本且已损坏 Windows Management Instrumentation (WMI)。You've upgraded your Windows Management Framework (WMF) version and have corrupted Windows Management Instrumentation (WMI).

解决方法Resolution

按照 DSC 已知问题和限制中的说明进行操作。Follow the instructions in DSC known issues and limitations.

场景:无法在 DSC 配置中使用凭据Scenario: Unable to use a credential in a DSC configuration

问题Issue

DSC 编译作业已暂停,且出现错误:Your DSC compilation job suspended with the error:

System.InvalidOperationException error processing property 'Credential' of type <some resource name>: Converting and storing an encrypted password as plaintext is allowed only if PSDscAllowPlainTextPassword is set to true.

原因Cause

已在配置中使用凭据,但未提供正确的 ConfigurationData,从而无法将每个节点配置的 PSDscAllowPlainTextPassword 设置为 true。You've used a credential in a configuration but didn't provide proper ConfigurationData to set PSDscAllowPlainTextPassword to true for each node configuration.

解决方法Resolution

确保传入正确的 ConfigurationData,以便将配置中涉及的每个节点配置的 PSDscAllowPlainTextPassword 设置为 true。Make sure to pass in the proper ConfigurationData to set PSDscAllowPlainTextPassword to true for each node configuration that's mentioned in the configuration. 请参阅在 Azure 自动化状态配置中编译 DSC 配置See Compiling DSC configurations in Azure Automation State Configuration.

场景:从 DSC 扩展启用计算机时出现“处理扩展时失败”错误Scenario: "Failure processing extension" error when enabling a machine from a DSC extension

问题Issue

使用 DSC 扩展启用计算机时失败,其中包含以下错误:When you enable a machine by using a DSC extension, a failure occurs that contains the error:

VM has reported a failure when processing extension 'Microsoft.Powershell.DSC'. Error message: \"DSC COnfiguration 'RegistrationMetaConfigV2' completed with error(s). Following are the first few: Registration of the Dsc Agent with the server <url> failed. The underlying error is: The attempt to register Dsc Agent with Agent Id <ID> with the server <url> return unexpected response code BadRequest. .\".

原因Cause

为节点分配服务中不存在的节点配置名称时,通常会发生此错误。This error typically occurs when the node is assigned a node configuration name that doesn't exist in the service.

解决方法Resolution

  • 请确保为节点分配的名称与服务中的名称完全匹配。Make sure that you're assigning the node with a name that exactly matches the name in the service.
  • 可以选择不包括节点配置名称,这将导致启用节点,但不会分配节点配置。You can choose to not include the node configuration name, which results in enabling the node but not assigning a node configuration.

场景:“预配失败”错误消息Scenario: "Provisioning has failed" error message

问题Issue

注册节点时,会看到以下错误:When you register a node, you see the error:

Provisioning has failed

原因Cause

节点与 Azure 之间的连接出现问题时,会出现此消息。This message occurs when there's an issue with connectivity between the node and Azure.

解决方法Resolution

确定节点是在虚拟专用网络 (VPN) 中,还是在连接到 Azure 时出现其他问题。Determine if your node is in a virtual private network (VPN) or has other issues connecting to Azure. 请参阅排查功能部署问题See Troubleshoot feature deployment issues.

场景:在 Linux 中应用配置时失败,并出现一般错误Scenario: Failure with a general error when applying a configuration in Linux

问题Issue

在 Linux 中应用配置时失败,其中包含以下错误:When you apply a configuration in Linux, a failure occurs that contains the error:

This event indicates that failure happens when LCM is processing the configuration. ErrorId is 1. ErrorDetail is The SendConfigurationApply function did not succeed.. ResourceId is [resource]name and SourceInfo is ::nnn::n::resource. ErrorMessage is A general error occurred, not covered by a more specific error code..

原因Cause

如果 /tmp 位置设置为 noexec,则 DSC 的当前版本将无法应用配置。If the /tmp location is set to noexec, the current version of DSC fails to apply configurations.

解决方法Resolution

从 /tmp 中删除 noexec 选项。Remove the noexec option from the /tmp location.

场景:重叠的节点配置名称可能导致版本错误Scenario: Node configuration names that overlap can result in a bad release

问题Issue

如果使用单个配置脚本生成多个节点配置,并且某些节点配置名称是其他名称的子集,则编译服务最终可能会分配错误的配置。When you use a single configuration script to generate multiple node configurations and some node configuration names are subsets of other names, the compilation service can end up assigning the wrong configuration. 仅当使用单个脚本生成每个节点包含配置数据的配置时,并且仅当在字符串的开头发生名称重叠时,才会出现此问题。This issue only occurs when you use a single script to generate configurations with configuration data per node, and only when the name overlap occurs at the beginning of the string. 其中一个示例是,用于生成配置的单个配置脚本,其根据使用 cmdlet 作为哈希表传递的节点数据来生成配置,并且该节点数据包括名为 server 和 1server 的服务器 。An example is a single configuration script used to generate configurations based on node data passed as a hashtable using cmdlets, and the node data includes servers named server and 1server.

原因Cause

这是编译服务的已知问题。This is a known issue with the compilation service.

解决方法Resolution

最佳解决方法是在本地或在 CI/CD 管道中编译,并将节点配置 MOF 文件直接上传到服务。The best workaround is to compile locally or in a CI/CD pipeline and upload the node configuration MOF files directly to the service. 如果服务中的编译是必需的,则下一个最佳解决方法是拆分编译作业,以便名称不发生重叠。If compilation in the service is a requirement, the next best workaround is to split the compilation jobs so that there's no overlap in names.

场景:DSC 配置上传时出现网关超时错误Scenario: Gateway timeout error on DSC configuration upload

问题Issue

上传 DSC 配置时收到 GatewayTimeout 错误。You receive a GatewayTimeout error when you upload a DSC configuration.

原因Cause

需要很长时间来编译的 DSC 配置可能导致此错误。DSC configurations that take a long time to compile can cause this error.

解决方法Resolution

可以通过显式包括任何 Import-DSCResource 调用的 ModuleName 参数来更快地分析 DSC 配置。You can make your DSC configurations parse faster by explicitly including the ModuleName parameter for any Import-DSCResource calls.

后续步骤Next steps

如果你的问题未在本文中列出,或者无法解决问题,请尝试通过以下渠道之一获取更多支持:If you don't see your problem here or you can't resolve your issue, try one of the following channels for additional support: