Troubleshoot errors with using Azure Policy
When you create policy definitions, work with SDKs, or set up the Azure Policy for Kubernetes add-on, you might run into errors. This article describes various general errors that might occur, and it suggests ways to resolve them.
Find error details
The location of the error details depends on what aspect of Azure Policy you're working with.
- If you're working with a custom policy, go to the Azure portal to get linting feedback about the schema, or review resulting compliance data to see how resources were evaluated.
- If you're working with any of the various SDKs, the SDK provides details about why the function failed.
- If you're working with the add-on for Kubernetes, start with the logging in the cluster.
General errors
Scenario: Alias not found
Issue
An incorrect or nonexistent alias is used in a policy definition. Azure Policy uses aliases to map to Azure Resource Manager properties.
Cause
An incorrect or nonexistent alias is used in a policy definition.
Resolution
First, validate that the Resource Manager property has an alias. If the alias for a Resource Manager property doesn't exist, create a support ticket.
Scenario: Evaluation details aren't up to date
Issue
A resource is in the Not Started state, or the compliance details aren't current.
Cause
A new policy or initiative assignment takes about five minutes to be applied. New or updated resources within scope of an existing assignment become available in about 15 minutes. A standard compliance scan occurs every 24 hours. For more information, see evaluation triggers.
Resolution
First, wait an appropriate amount of time for an evaluation to finish and compliance results to become available in the Azure portal or the SDK. To start a new evaluation scan with Azure PowerShell or the REST API, see On-demand evaluation scan.
Scenario: Compliance isn't as expected
Issue
A resource isn't in either the Compliant or Not-Compliant evaluation state expected for the resource.
Cause
The resource isn't in the correct scope for the policy assignment, or the policy definition doesn't operate as intended.
Resolution
To troubleshoot your policy definition, do the following steps:
- First, wait the appropriate amount of time for an evaluation to finish and compliance results to become available in the Azure portal or SDK.
- To start a new evaluation scan with Azure PowerShell or the REST API, see On-demand evaluation scan.
- Ensure that the assignment parameters and assignment scope are set correctly.
- Check the policy definition mode:
- The mode should be
all
for all resource types. - The mode should be
indexed
if the policy definition checks for tags or location.
- The mode should be
- Ensure that the scope of the resource isn't excluded or exempt.
- If compliance for a policy assignment shows
0/0
resources, no resources were determined to be applicable within the assignment scope. Check both the policy definition and the assignment scope. - For a noncompliant resource that was expected to be compliant, see
determine the reasons for noncompliance. The comparison
of the definition to the evaluated property value indicates why a resource was noncompliant.
- If the target value is wrong, revise the policy definition.
- If the current value is wrong, validate the resource payload through
Resource Explorer
.
- For a Resource Provider mode definition that supports a RegEx string parameter (such as
Microsoft.Kubernetes.Data
and the built-in definition "Container images should be deployed from trusted registries only"), validate that the RegEx string parameter is correct. - For other common issues and solutions, see Troubleshoot: Enforcement not as expected.
If you still have an issue with your duplicated and customized built-in policy definition or custom definition, create a support ticket under Authoring a policy to route the issue correctly.
Scenario: Enforcement not as expected
Issue
A resource that you expect Azure Policy to act on isn't being acted on, and there's no entry in the Azure Activity log.
Cause
The policy assignment was configured for an enforcementMode setting of Disabled. While enforcementMode
is disabled, the policy effect isn't enforced, and there's no entry in the Activity log.
Resolution
Troubleshoot your policy assignment's enforcement by doing the following steps:
- First, wait the appropriate amount of time for an evaluation to finish and compliance results to become available in the Azure portal or the SDK.
- To start a new evaluation scan with Azure PowerShell or the REST API, see On-demand evaluation scan.
- Ensure that the assignment parameters and assignment scope are set correctly and that
enforcementMode
is Enabled. - Check the policy definition mode:
- The mode should be
all
for all resource types. - The mode should be
indexed
if the policy definition checks for tags or location.
- The mode should be
- Ensure that the scope of the resource isn't excluded or exempt.
- Verify that the resource payload matches the policy logic. This verification can be done by capturing an HTTP Archive (HAR) trace or reviewing the Azure Resource Manager template (ARM template) properties.
- For other common issues and solutions, see Troubleshoot: Compliance not as expected.
If you still have an issue with your duplicated and customized built-in policy definition or custom definition, create a support ticket under Authoring a policy to route the issue correctly.
Scenario: Denied by Azure Policy
Issue
Creation or update of a resource is denied.
Cause
A policy assignment to the scope of your new or updated resource meets the criteria of a policy definition with a Deny effect. Resources that meet these definitions are prevented from being created or updated.
Resolution
The error message from a deny policy assignment includes the policy definition and policy assignment IDs. If the error information in the message is missed, it's also available in the Activity log. Use this information to get more details to understand the resource restrictions and adjust the resource properties in your request to match allowed values.
Scenario: Definition targets multiple resource types
Issue
A policy definition that includes multiple resource types fails validation during creation or update with the following error:
The policy definition '{0}' targets multiple resource types, but the policy rule is authored in a way that makes the policy not applicable to the target resource types '{1}'.
Cause
The policy definition rule has one or more conditions that don't get evaluated by the target resource types.
Resolution
If an alias is used, make sure that the alias gets evaluated against only the resource type it belongs to by adding a type condition before it. An alternative is to split the policy definition into multiple definitions to avoid targeting multiple resource types.
Scenario: Subscription limit exceeded
Issue
An error message on the compliance page in Azure portal is shown when retrieving compliance for policy assignments.
Cause
The number of subscriptions under the selected scopes in the request exceeded the limit of 5,000 subscriptions. The compliance results might be partially displayed.
Resolution
To see the complete results, select a more granular scope with fewer child subscriptions.
Template errors
Scenario: Policy supported functions processed by template
Issue
Azure Policy supports many ARM template functions and functions that are available only in a policy definition. Resource Manager processes these functions as part of a deployment instead of as part of a policy definition.
Cause
Using supported functions, such as parameter()
or resourceGroup()
, results in the processed outcome of the function at deployment time instead of allowing the function for the policy definition and Azure Policy engine to process.
Resolution
To pass a function through as part of a policy definition, escape the entire string with [
such that the property looks like [[resourceGroup().tags.myTag]
. The escape character causes Resource Manager to treat the value as a string when it processes the template. Azure Policy then places the function into the policy definition, which allows it to be dynamic as expected. For more information, see Syntax and expressions in Azure Resource Manager templates.
Add-on for Kubernetes installation errors
Scenario: Installation by using a Helm Chart fails because of a password error
Issue
The helm install azure-policy-addon
command fails, and it returns one of the following errors:
!: event not found
Error: failed parsing --set data: key "<key>" has no value (cannot end with ,)
Cause
The generated password includes a comma (,
), which the Helm Chart is splitting on.
Resolution
When you run helm install azure-policy-addon
, escape the comma (,
) in the password value with a backslash (\
).
Scenario: Installation by using a Helm Chart fails because the name already exists
Issue
The helm install azure-policy-addon
command fails, and it returns the following error:
Error: cannot re-use a name that is still in use
Cause
The Helm Chart with the name azure-policy-addon
was already installed or partially installed.
Resolution
Follow the instructions to remove the Azure Policy for Kubernetes add-on, then rerun the helm install azure-policy-addon
command.
Scenario: Azure virtual machine user-assigned identities are replaced by system-assigned managed identities
Issue
After you assign Guest Configuration policy initiatives to audit settings inside a machine, the user-assigned managed identities that were assigned to the machine are no longer assigned. Only a system-assigned managed identity is assigned.
Cause
The policy definitions that were previously used in Guest Configuration deployIfNotExists
definitions ensured that a system-assigned identity is assigned to the machine. But they also removed the user-assigned identity assignments.
Resolution
The definitions that previously caused this issue appear as \[Deprecated\]
, and are replaced by policy definitions that manage prerequisites without removing user-assigned managed identities. A manual step is required. Delete any existing policy assignments that are marked as \[Deprecated\]
, and replace them with the updated prerequisite policy initiative and policy definitions that have the same name as the original.
For a detailed narrative, see the blog post Important change released for Guest Configuration audit policies.
Add-on for Kubernetes general errors
Scenario: The add-on is unable to reach the Azure Policy service endpoint because of egress restrictions
Issue
The add-on can't reach the Azure Policy service endpoint, and it returns one of the following errors:
failed to fetch token, service not reachable
Error getting file "Get https://raw.githubusercontent.com/Azure/azure-policy/master/built-in-references/Kubernetes/container-allowed-images/template.yaml: dial tcp 151.101.228.133.443: connect: connection refused
Cause
This issue occurs when a cluster egress is locked down.
Resolution
Ensure that the domains and ports mentioned in the following article are open:
Scenario: The add-on is unable to reach the Azure Policy service endpoint because of the aad-pod-identity configuration
Issue
The add-on can't reach the Azure Policy service endpoint, and it returns one of the following errors:
azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://gov-prod-policy-data.trafficmanager.cn/checkDataPolicyCompliance?api-version=2019-01-01-preview: StatusCode=404
adal: Refresh request failed. Status Code = '404'. Response body: getting assigned identities for pod kube-system/azure-policy-8c785548f-r882p in CREATED state failed after 16 attempts, retry duration [5]s, error: <nil>
Cause
This error occurs when aad-pod-identity
is installed on the cluster and the kube-system pods aren't excluded in aad-pod-identity
.
The aad-pod-identity
component Node Managed Identity (NMI) pods modify the nodes' iptables to intercept calls to the Azure instance metadata endpoint. This setup means that any request made to the metadata endpoint is intercepted by NMI, even if the pod doesn't use aad-pod-identity
. The AzurePodIdentityException
CustomResourceDefinition (CRD) can be configured to inform aad-pod-identity
that any requests to a metadata endpoint that originate from a pod matching the labels defined in the CRD should be proxied without any processing in NMI.
Resolution
Exclude the system pods that have the kubernetes.azure.com/managedby: aks
label in kube-system namespace in aad-pod-identity
by configuring the AzurePodIdentityException
CRD.
For more information, see Disable the Azure Active Directory (Azure AD) pod identity for a specific pod/application.
To configure an exception, follow this example:
apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzurePodIdentityException
metadata:
name: mic-exception
namespace: default
spec:
podLabels:
app: mic
component: mic
---
apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzurePodIdentityException
metadata:
name: aks-addon-exception
namespace: kube-system
spec:
podLabels:
kubernetes.azure.com/managedby: aks
Scenario: The resource provider isn't registered
Issue
The add-on can reach the Azure Policy service endpoint, but the add-on logs display one of the following errors:
The resource provider 'Microsoft.PolicyInsights' is not registered in subscription '{subId}'. See https://aka.ms/policy-register-subscription for how to register subscriptions.
policyinsightsdataplane.BaseClient#CheckDataPolicyCompliance: Failure responding to request: StatusCode=500 -- Original Error: autorest/azure: Service returned an error. Status=500 Code="InternalServerError" Message="Encountered an internal server error.
Cause
The Microsoft.PolicyInsights
resource provider isn't registered. It must be registered for the add-on to get policy definitions and return compliance data.
Resolution
Register the Microsoft.PolicyInsights
resource provider in the cluster subscription. For instructions, see Register a resource provider.
Scenario: The subscription is disabled
Issue
The add-on can reach the Azure Policy service endpoint, but the following error is displayed:
The subscription '{subId}' has been disabled for azure data-plane policy. Please contact support.
Cause
This error means that the subscription was determined to be problematic, and the feature flag Microsoft.PolicyInsights/DataPlaneBlocked
was added to block the subscription.
Resolution
To investigate and resolve this issue, contact the feature team.
Scenario: Definitions in category "Guest Configuration" cannot be duplicated from Azure portal
Issue
When attempting to create a custom policy definition from the Azure portal page for policy definitions, you select the Duplicate definition button. After assigning the policy, you find machines are NonCompliant because no guest configuration assignment resource exists.
Cause
Guest configuration relies on custom metadata added to policy definitions when creating guest configuration assignment resources. The Duplicate definition activity in the Azure portal doesn't copy custom metadata.
Resolution
Instead of using the portal, duplicate the policy definition using the Policy Insights API. The following PowerShell sample provides an option.
# duplicates the built-in policy which audits Windows machines for pending reboots
$def = Get-AzPolicyDefinition -id '/providers/Microsoft.Authorization/policyDefinitions/4221adbc-5c0f-474f-88b7-037a99e6114c' | % Properties
New-AzPolicyDefinition -name (new-guid).guid -DisplayName "$($def.DisplayName) (Copy)" -Description $def.Description -Metadata ($def.Metadata | convertto-json) -Parameter ($def.Parameters | convertto-json) -Policy ($def.PolicyRule | convertto-json -depth 15)
Scenario: Kubernetes resource gets created during connectivity failure despite deny policy being assigned
Issue
If there's a Kubernetes cluster connectivity failure, evaluation for newly created or updated resources might be bypassed due to Gatekeeper's fail-open behavior.
Cause
The GK fail-open model is by design and based on community feedback. Gatekeeper documentation expands on these reasons here: https://open-policy-agent.github.io/gatekeeper/website/docs/failing-closed#considerations.
Resolution
In the prior event, the error case can be monitored from the admission webhook metrics provided by the kube-apiserver
. If evaluation is bypassed at creation time and an object is created, it's reported on Azure Policy compliance as non-compliant as a flag to customers.
Regardless of the scenario, Azure policy retains the last known policy on the cluster and keeps the guardrails in place.
Next steps
If your problem isn't listed in this article or you can't resolve it, get support by visiting one of the following channels:
Get answers from experts through Microsoft Q&A.
If you still need help, go to the Azure support site and submit your request.