排查使用 Azure Policy 时出现的错误Troubleshoot errors with using Azure Policy

创建策略定义、使用 SDK 或设置适用于 Kubernetes 的 Azure Policy 附加产品时,可能会遇到错误。When you create policy definitions, work with SDKs, or set up the Azure Policy for Kubernetes add-on, you might run into errors. 本文介绍描述可能会发生的各种常见错误,并提供建议的解决方法。This article describes various general errors that might occur, and it suggests ways to resolve them.

查找错误详细信息Find error details

错误详细信息的位置取决于使用的是 Azure Policy 的哪个部分。The location of the error details depends on what aspect of Azure Policy you're working with.

  • 如果使用的是自定义策略,请转到 Azure 门户以获取有关架构的 Lint 分析反馈,或查看生成的符合性数据以了解资源的评估方式。If you're working with a custom policy, go to the Azure portal to get linting feedback about the schema, or review resulting compliance data to see how resources were evaluated.
  • 如果使用的是各种 SDK,则 SDK 提供有关函数失败原因的详细信息。If you're working with any of the various SDKs, the SDK provides details about why the function failed.
  • 如果要使用 Kubernetes 附加产品,请从群集中的日志记录开始。If you're working with the add-on for Kubernetes, start with the logging in the cluster.

常规错误General errors

方案:找不到别名Scenario: Alias not found

问题Issue

策略定义中使用的别名不正确或不存在。An incorrect or nonexistent alias is used in a policy definition. Azure Policy 使用别名映射到 Azure 资源管理器属性。Azure Policy uses aliases to map to Azure Resource Manager properties.

原因Cause

策略定义中使用的别名不正确或不存在。An incorrect or nonexistent alias is used in a policy definition.

解决方法Resolution

首先,验证资源管理器属性是否有别名。First, validate that the Resource Manager property has an alias. 若要查找可用别名,请转到适用于 Visual Studio Code 的 Azure Policy 扩展或 SDK。To look up the available aliases, go to Azure Policy extension for Visual Studio Code or the SDK. 如果资源管理器属性没有别名,请创建支持票证。If the alias for a Resource Manager property doesn't exist, create a support ticket.

方案:评估详细信息不是最新的Scenario: Evaluation details aren't up to date

问题Issue

资源处于“未启动”状态或符合性详细信息不是最新的。A resource is in the Not Started state, or the compliance details aren't current.

原因Cause

应用新策略或计划分配大约需要 30 分钟。A new policy or initiative assignment takes about 30 minutes to be applied. 现有分配范围内的新资源或更新的资源大约只需 15 分钟即可使用。New or updated resources within scope of an existing assignment become available in about 15 minutes. 标准符合性扫描每 24 小时进行一次。A standard compliance scan occurs every 24 hours. 有关详细信息,请参阅评估触发器For more information, see evaluation triggers.

解决方法Resolution

首先,请等待一段时间来完成评估以及等待 Azure 门户或 SDK 中显示符合性结果。First, wait an appropriate amount of time for an evaluation to finish and compliance results to become available in the Azure portal or the SDK. 若要使用 Azure PowerShell 或 REST API 开始新的评估扫描,请参阅按需评估扫描To start a new evaluation scan with Azure PowerShell or the REST API, see On-demand evaluation scan.

方案:符合性与预期不符Scenario: Compliance isn't as expected

问题Issue

资源未处于预期有效的评估状态(符合或不符合) 。A resource isn't in either the Compliant or Not-Compliant evaluation state that's expected for the resource.

原因Cause

资源不在正确的策略分配范围内,或者策略定义未按预期执行。The resource isn't in the correct scope for the policy assignment, or the policy definition doesn't operate as intended.

解决方法Resolution

若要解决策略定义问题,请执行以下操作:To troubleshoot your policy definition, do the following:

  1. 首先,请等待一段时间来完成评估以及等待 Azure 门户或 SDK 中显示符合性结果。First, wait the appropriate amount of time for an evaluation to finish and compliance results to become available in the Azure portal or SDK.

  2. 若要使用 Azure PowerShell 或 REST API 开始新的评估扫描,请参阅按需评估扫描To start a new evaluation scan with Azure PowerShell or the REST API, see On-demand evaluation scan.

  3. 确保分配参数和分配范围已正确设置。Ensure that the assignment parameters and assignment scope are set correctly.

  4. 检查策略定义模式Check the policy definition mode:

    • 此模式应为 all,适用于所有资源类型。The mode should be all for all resource types.
    • 如果策略定义检查标记或位置,则此模式应为 indexedThe mode should be indexed if the policy definition checks for tags or location.
  5. 确保资源的范围未被排除豁免Ensure that the scope of the resource isn't excluded or exempt.

  6. 如果策略分配的符合性显示 0/0 资源,表示没有确定在分配范围内适用的资源。If compliance for a policy assignment shows 0/0 resources, no resources were determined to be applicable within the assignment scope. 检查策略定义和分配范围。Check both the policy definition and the assignment scope.

  7. 对于应合规但实际不合规的资源,请参阅确定不合规的原因For a noncompliant resource that was expected to be compliant, see determine the reasons for noncompliance. 通过将定义与计算的属性值进行比较,可了解资源不合规的原因。The comparison of the definition to the evaluated property value indicates why a resource was noncompliant.

    • 如果“目标值”错误,请修改策略定义。If the target value is wrong, revise the policy definition.
  8. 有关其他常见问题和解决方案,请参阅故障排查:强制实施与预期不符For other common issues and solutions, see Troubleshoot: Enforcement not as expected.

如果复制的和自定义的内置策略定义或自定义定义仍存在问题,请在“创作策略”下创建支持票证,以正确提交问题。If you still have an issue with your duplicated and customized built-in policy definition or custom definition, create a support ticket under Authoring a policy to route the issue correctly.

方案:未按预期执行Scenario: Enforcement not as expected

问题Issue

预期由 Azure Policy 处理的资源未被处理,并且 Azure 活动日志中没有条目。A resource that you expect Azure Policy to act on isn't being acted on, and there's no entry in the Azure Activity log.

原因Cause

已为 enforcementMode“禁用”设置配置了策略分配。The policy assignment has been configured for an enforcementMode setting of Disabled. 当 enforcementMode 处于禁用状态时,不会强制实施策略效果,并且活动日志中没有条目。While enforcementMode is disabled, the policy effect isn't enforced, and there's no entry in the Activity log.

解决方法Resolution

通过执行以下操作来排查策略分配的实施问题:Troubleshoot your policy assignment's enforcement by doing the following:

  1. 首先,请等待一段时间来完成评估以及等待 Azure 门户或 SDK 中显示符合性结果。First, wait the appropriate amount of time for an evaluation to finish and compliance results to become available in the Azure portal or the SDK.

  2. 若要使用 Azure PowerShell 或 REST API 开始新的评估扫描,请参阅按需评估扫描To start a new evaluation scan with Azure PowerShell or the REST API, see On-demand evaluation scan.

  3. 确保分配参数和分配范围已正确设置,以及“enforcementMode”是否为“已启用”。Ensure that the assignment parameters and assignment scope are set correctly and that enforcementMode is Enabled.

  4. 检查策略定义模式Check the policy definition mode:

    • 此模式应为 all,适用于所有资源类型。The mode should be all for all resource types.
    • 如果策略定义检查标记或位置,则此模式应为 indexedThe mode should be indexed if the policy definition checks for tags or location.
  5. 确保资源的范围未被排除豁免Ensure that the scope of the resource isn't excluded or exempt.

  6. 验证资源有效负载是否与策略逻辑匹配。Verify that the resource payload matches the policy logic. 可以通过捕获 HTTP 存档 (HAR) 跟踪或查看 Azure 资源管理器模板(ARM 模板)属性来完成此操作。This can be done by capturing an HTTP Archive (HAR) trace or reviewing the Azure Resource Manager template (ARM template) properties.

  7. 有关其他常见问题和解决方案,请参阅故障排查:合规性与预期不符For other common issues and solutions, see Troubleshoot: Compliance not as expected.

如果复制的和自定义的内置策略定义或自定义定义仍存在问题,请在“创作策略”下创建支持票证,以正确提交问题。If you still have an issue with your duplicated and customized built-in policy definition or custom definition, create a support ticket under Authoring a policy to route the issue correctly.

方案:被 Azure Policy 拒绝Scenario: Denied by Azure Policy

问题Issue

拒绝创建或更新资源。Creation or update of a resource is denied.

原因Cause

向新资源或更新的资源的范围执行的策略分配符合设有拒绝效果的策略定义的条件。A policy assignment to the scope of your new or updated resource meets the criteria of a policy definition with a Deny effect. 符合这些定义的资源将无法创建或更新。Resources that meet these definitions are prevented from being created or updated.

解决方法Resolution

拒绝策略分配中的错误消息包括策略定义和策略分配 ID。The error message from a deny policy assignment includes the policy definition and policy assignment IDs. 如果消息中的错误信息丢失,还可在活动日志中找到。If the error information in the message is missed, it's also available in the Activity log. 使用此信息可获取更多详细信息,以了解资源限制和调整请求中的资源属性以使其匹配允许的值。Use this information to get more details to understand the resource restrictions and adjust the resource properties in your request to match allowed values.

方案:定义针对多个资源类型Scenario: Definition targets multiple resource types

问题Issue

包含多个资源类型的策略定义在创建或更新过程中验证失败,出现以下错误:A policy definition that includes multiple resource types fails validation during creation or update with the following error:

The policy definition '{0}' targets multiple resource types, but the policy rule is authored in a way that makes the policy not applicable to the target resource types '{1}'.

原因Cause

策略定义规则有一个或多个未由目标资源类型评估的条件。The policy definition rule has one or more conditions that don't get evaluated by the target resource types.

解决方法Resolution

如果使用了别名,请通过在别名之前添加类型条件,确保仅根据别名所属的资源类型对别名进行评估。If an alias is used, make sure that the alias gets evaluated against only the resource type it belongs to by adding a type condition before it. 一种替代方法是将策略定义拆分为多个定义,以避免针对多个资源类型。An alternative is to split the policy definition into multiple definitions to avoid targeting multiple resource types.

模板错误Template errors

方案:策略支持的、由模板处理的函数Scenario: Policy supported functions processed by template

问题Issue

Azure Policy 支持大量 ARM 模板函数以及仅在策略定义中可用的函数。Azure Policy supports a number of ARM template functions and functions that are available only in a policy definition. 资源管理器将这些函数作为部署的一部分而不是作为策略定义的一部分进行处理。Resource Manager processes these functions as part of a deployment instead of as part of a policy definition.

原因Cause

如果使用受支持的函数(如 parameter()resourceGroup()),可在部署时生成函数的处理结果,而不是允许策略定义和 Azure Policy 引擎处理函数。Using supported functions, such as parameter() or resourceGroup(), results in the processed outcome of the function at deployment time instead of allowing the function for the policy definition and Azure Policy engine to process.

解决方法Resolution

若要将函数作为策略定义的一部分进行传递,请使用 [ 转义整个字符串,以便使属性看起来像是 [[resourceGroup().tags.myTag]To pass a function through as part of a policy definition, escape the entire string with [ such that the property looks like [[resourceGroup().tags.myTag]. 转义字符会导致资源管理器在处理模板时将值视为字符串。The escape character causes Resource Manager to treat the value as a string when it processes the template. 然后,Azure Policy 将函数放置在策略定义中,使其能够按预期的动态方式执行。Azure Policy then places the function into the policy definition, which allows it to be dynamic as expected. 有关详细信息,请参阅 Azure 资源管理器模板中的语法和表达式For more information, see Syntax and expressions in Azure Resource Manager templates.

Kubernetes 加载项安装错误Add-on for Kubernetes installation errors

方案:由于密码错误而导致使用 Helm 图表安装失败Scenario: Installation by using a Helm Chart fails because of a password error

问题Issue

helm install azure-policy-addon 命令失败,并返回以下错误之一:The helm install azure-policy-addon command fails, and it returns one of the following errors:

  • !: event not found
  • Error: failed parsing --set data: key "<key>" has no value (cannot end with ,)

原因Cause

生成的密码包含 Helm 图表要用于拆分的逗号 (,)。The generated password includes a comma (,), which the Helm Chart is splitting on.

解决方法Resolution

运行 helm install azure-policy-addon 时,使用反斜杠 (\) 转义密码值中的逗号 (,)。When you run helm install azure-policy-addon, escape the comma (,) in the password value with a backslash (\).

方案:由于名称已存在而导致使用 Helm 图表安装失败Scenario: Installation by using a Helm Chart fails because the name already exists

问题Issue

helm install azure-policy-addon 命令失败,并返回以下错误:The helm install azure-policy-addon command fails, and it returns the following error:

  • Error: cannot re-use a name that is still in use

原因Cause

已安装或部分安装带有名称 azure-policy-addon 的 Helm 图表。The Helm Chart with the name azure-policy-addon has already been installed or partially installed.

解决方法Resolution

按照说明删除适用于Kubernetes 的 Azure Policy 附加产品,然后重新运行 helm install azure-policy-addon 命令。Follow the instructions to remove the Azure Policy for Kubernetes add-on, then rerun the helm install azure-policy-addon command.

场景:Azure 虚拟机用户分配的标识替换为系统分配的托管标识Scenario: Azure virtual machine user-assigned identities are replaced by system-assigned managed identities

问题Issue

向计算机内的审核设置分配来宾配置策略计划后,不会再分配向计算机分配的用户分配的托管标识。After you assign Guest Configuration policy initiatives to audit settings inside a machine, the user-assigned managed identities that were assigned to the machine are no longer assigned. 只会分配系统分配的托管标识。Only a system-assigned managed identity is assigned.

原因Cause

以前在来宾配置 DeployIfNotExists 定义中使用的策略定义可确保将系统分配的标识分配给计算机,但它们还会删除用户分配的标识分配。The policy definitions that were previously used in Guest Configuration DeployIfNotExists definitions ensured that a system-assigned identity is assigned to the machine, but they also removed the user-assigned identity assignments.

解决方法Resolution

以前导致此问题的定义显示为“已弃用”,它们将被可管理先决条件而不会删除用户分配的托管标识的策略定义所取代 []The definitions that previously caused this issue appear as [Deprecated], and they're replaced by policy definitions that manage prerequisites without removing user-assigned managed identities. 需要手动操作。A manual step is required. 删除标记为“已弃用”的任何现有策略分配,并将其替换为名称与原来相同的已更新的先决条件策略计划和策略定义 []Delete any existing policy assignments that are marked as [Deprecated], and replace them with the updated prerequisite policy initiative and policy definitions that have the same name as the original.

有关详细叙述,请参阅博客文章:为来宾配置审核策略发布了重要更改For a detailed narrative, see the blog post Important change released for Guest Configuration audit policies.

Kubernetes 加载项的常规错误Add-on for Kubernetes general errors

方案:由于出口限制,加载项无法访问 Azure Policy 服务终结点Scenario: The add-on is unable to reach the Azure Policy service endpoint because of egress restrictions

问题Issue

附加产品无法访问 Azure Policy 服务终结点,并返回以下错误之一:The add-on can't reach the Azure Policy service endpoint, and it returns one of the following errors:

  • failed to fetch token, service not reachable
  • Error getting file "Get https://raw.githubusercontent.com/Azure/azure-policy/master/built-in-references/Kubernetes/container-allowed-images/template.yaml: dial tcp 151.101.228.133.443: connect: connection refused

原因Cause

锁定群集流出量时会出现此问题。This issue occurs when a cluster egress is locked down.

解决方法Resolution

确保以下文章中提到的域和端口处于打开状态:Ensure that the domains and ports mentioned in the following articles are open:

方案:由于 aad-pod-identity 配置,加载项无法访问 Azure Policy 服务终结点Scenario: The add-on is unable to reach the Azure Policy service endpoint because of the aad-pod-identity configuration

问题Issue

附加产品无法访问 Azure Policy 服务终结点,并返回以下错误之一:The add-on can't reach the Azure Policy service endpoint, and it returns one of the following errors:

  • azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://gov-prod-policy-data.trafficmanager.cn/checkDataPolicyCompliance?api-version=2019-01-01-preview: StatusCode=404
  • adal: Refresh request failed. Status Code = '404'. Response body: getting assigned identities for pod kube-system/azure-policy-8c785548f-r882p in CREATED state failed after 16 attempts, retry duration [5]s, error: <nil>

原因Cause

当群集上安装了 add-pod-identity,并且“aad-pod-identity”中未排除“kube-system”pod 时,会出现此错误 。This error occurs when add-pod-identity is installed on the cluster and the kube-system pods aren't excluded in aad-pod-identity.

“aad-pod-identity”组件节点托管标识 (NMI) pod 会修改节点的 iptable,以拦截对 Azure 实例元数据终结点的调用。The aad-pod-identity component Node Managed Identity (NMI) pods modify the nodes' iptables to intercept calls to the Azure instance metadata endpoint. 此设置意味着对元数据终结点发出的任何请求都会被 NMI 拦截,即使 pod 不使用 aad-pod-identity 也是如此。This setup means that any request that's made to the metadata endpoint is intercepted by NMI, even if the pod doesn't use aad-pod-identity. 可以将 AzurePodIdentityException CustomResourceDefinition (CRD) 配置为通知 aad-pod-identity 应该对向元数据终结点发出的任何请求(源自与 CRD 中定义的标签匹配的 pod)进行代理而不在 NMI 中进行任何处理 。The AzurePodIdentityException CustomResourceDefinition (CRD) can be configured to inform aad-pod-identity that any requests to a metadata endpoint that originate from a pod matching the labels defined in the CRD should be proxied without any processing in NMI.

解决方法Resolution

通过配置 AzurePodIdentityException CRD,在“aad-pod-identity”中排除在 kube-system 命名空间中具有 kubernetes.azure.com/managedby: aks 标签的系统 pod 。Exclude the system pods that have the kubernetes.azure.com/managedby: aks label in kube-system namespace in aad-pod-identity by configuring the AzurePodIdentityException CRD.

有关详细信息,请参阅为特定 pod/应用程序禁用 Azure Active Directory (Azure AD) pod 标识For more information, see Disable the Azure Active Directory (Azure AD) pod identity for a specific pod/application.

若要配置例外情况,请按以下示例操作:To configure an exception, follow this example:

apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzurePodIdentityException
metadata:
  name: mic-exception
  namespace: default
spec:
  podLabels:
    app: mic
    component: mic
---
apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzurePodIdentityException
metadata:
  name: aks-addon-exception
  namespace: kube-system
spec:
  podLabels:
    kubernetes.azure.com/managedby: aks

方案:未注册资源提供程序Scenario: The resource provider isn't registered

问题Issue

加载项可以访问 Azure Policy 服务终结点,但加载项日志会显示以下错误之一:The add-on can reach the Azure Policy service endpoint, but the add-on logs display one of the following errors:

  • The resource provider 'Microsoft.PolicyInsights' is not registered in subscription '{subId}'. See https://aka.ms/policy-register-subscription for how to register subscriptions.

  • policyinsightsdataplane.BaseClient#CheckDataPolicyCompliance: Failure responding to request: StatusCode=500 -- Original Error: autorest/azure: Service returned an error. Status=500 Code="InternalServerError" Message="Encountered an internal server error.

原因Cause

未注册“Microsoft.PolicyInsights”资源提供程序。The 'Microsoft.PolicyInsights' resource provider isn't registered. 必须为加载项注册该资源提供程序,才能获取策略定义并返回符合性数据。It must be registered for the add-on to get policy definitions and return compliance data.

解决方法Resolution

在群集订阅中注册“Microsoft.PolicyInsights”资源提供程序。Register the 'Microsoft.PolicyInsights' resource provider in the cluster subscription. 有关说明,请参阅注册资源提供程序For instructions, see Register a resource provider.

方案:订阅已禁用Scenario: The subscription is disabled

问题Issue

加载项可以访问 Azure Policy 服务终结点,但会显示以下错误:The add-on can reach the Azure Policy service endpoint, but the following error is displayed:

The subscription '{subId}' has been disabled for azure data-plane policy. Please contact support.

原因Cause

此错误表示订阅明确存在问题,并且添加了功能标志 Microsoft.PolicyInsights/DataPlaneBlocked 来阻止订阅。This error means that the subscription was determined to be problematic, and the feature flag Microsoft.PolicyInsights/DataPlaneBlocked was added to block the subscription.

解决方法Resolution

若要调查并解决此问题,请联系功能团队To investigate and resolve this issue, contact the feature team.

后续步骤Next steps

如果你的问题未在本文中列出或者无法解决,请访问以下任一通道以获取支持:If your problem isn't listed in this article or you can't resolve it, get support by visiting one of the following channels: