Troubleshoot errors with using Azure Policy
When you create policy definitions, work with SDKs, or set up the Azure Policy for Kubernetes add-on, you might run into errors. This article describes various general errors that might occur, and it suggests ways to resolve them.
The location of the error details depends on what aspect of Azure Policy you're working with.
- If you're working with a custom policy, go to the Azure portal to get linting feedback about the schema, or review resulting compliance data to see how resources were evaluated.
- If you're working with any of the various SDKs, the SDK provides details about why the function failed.
- If you're working with the add-on for Kubernetes, start with the logging in the cluster.
An incorrect or nonexistent alias is used in a policy definition. Azure Policy uses aliases to map to Azure Resource Manager properties.
An incorrect or nonexistent alias is used in a policy definition.
First, validate that the Resource Manager property has an alias. If the alias for a Resource Manager property doesn't exist, create a support ticket.
A resource is in the Not Started state, or the compliance details aren't current.
A new policy or initiative assignment takes about five minutes to be applied. New or updated resources within scope of an existing assignment become available in about 15 minutes. A standard compliance scan occurs every 24 hours. For more information, see evaluation triggers.
First, wait an appropriate amount of time for an evaluation to finish and compliance results to become available in the Azure portal or the SDK. To start a new evaluation scan with Azure PowerShell or the REST API, see On-demand evaluation scan.
A resource isn't in either the Compliant or Not-Compliant evaluation state expected for the resource.
The resource isn't in the correct scope for the policy assignment, or the policy definition doesn't operate as intended.
To troubleshoot your policy definition, do the following steps:
- First, wait the appropriate amount of time for an evaluation to finish and compliance results to become available in the Azure portal or SDK.
- To start a new evaluation scan with Azure PowerShell or the REST API, see On-demand evaluation scan.
- Ensure that the assignment parameters and assignment scope are set correctly.
- Check the policy definition mode:
- The mode should be
all
for all resource types. - The mode should be
indexed
if the policy definition checks for tags or location.
- The mode should be
- Ensure that the scope of the resource isn't excluded or exempt.
- If compliance for a policy assignment shows
0/0
resources, no resources were determined to be applicable within the assignment scope. Check both the policy definition and the assignment scope. - For a noncompliant resource that was expected to be compliant, see
determine the reasons for noncompliance. The comparison
of the definition to the evaluated property value indicates why a resource was noncompliant.
- If the target value is wrong, revise the policy definition.
- If the current value is wrong, validate the resource payload through
Resource Explorer
.
- For a Resource Provider mode definition that supports a RegEx string parameter (such as
Microsoft.Kubernetes.Data
and the built-in definition "Container images should be deployed from trusted registries only"), validate that the RegEx string parameter is correct. - For other common issues and solutions, see Troubleshoot: Enforcement not as expected.
If you still have an issue with your duplicated and customized built-in policy definition or custom definition, create a support ticket under Authoring a policy to route the issue correctly.
A resource that you expect Azure Policy to act on isn't being acted on, and there's no entry in the Azure Activity log.
The policy assignment was configured for an enforcementMode setting of Disabled. While enforcementMode
is disabled, the policy effect isn't enforced, and there's no entry in the Activity log.
Troubleshoot your policy assignment's enforcement by doing the following steps:
- First, wait the appropriate amount of time for an evaluation to finish and compliance results to become available in the Azure portal or the SDK.
- To start a new evaluation scan with Azure PowerShell or the REST API, see On-demand evaluation scan.
- Ensure that the assignment parameters and assignment scope are set correctly and that
enforcementMode
is Enabled. - Check the policy definition mode:
- The mode should be
all
for all resource types. - The mode should be
indexed
if the policy definition checks for tags or location.
- The mode should be
- Ensure that the scope of the resource isn't excluded or exempt.
- Verify that the resource payload matches the policy logic. This verification can be done by capturing an HTTP Archive (HAR) trace or reviewing the Azure Resource Manager template (ARM template) properties.
- For other common issues and solutions, see Troubleshoot: Compliance not as expected.
If you still have an issue with your duplicated and customized built-in policy definition or custom definition, create a support ticket under Authoring a policy to route the issue correctly.
Creation or update of a resource is denied.
A policy assignment to the scope of your new or updated resource meets the criteria of a policy definition with a Deny effect. Resources that meet these definitions are prevented from being created or updated.
The error message from a deny policy assignment includes the policy definition and policy assignment IDs. If the error information in the message is missed, it's also available in the Activity log. Use this information to get more details to understand the resource restrictions and adjust the resource properties in your request to match allowed values.
A policy definition that includes multiple resource types fails validation during creation or update with the following error:
The policy definition '{0}' targets multiple resource types, but the policy rule is authored in a way that makes the policy not applicable to the target resource types '{1}'.
The policy definition rule has one or more conditions that don't get evaluated by the target resource types.
If an alias is used, make sure that the alias gets evaluated against only the resource type it belongs to by adding a type condition before it. An alternative is to split the policy definition into multiple definitions to avoid targeting multiple resource types.
An error message on the compliance page in Azure portal is shown when retrieving compliance for policy assignments.
The number of subscriptions under the selected scopes in the request exceeded the limit of 5,000 subscriptions. The compliance results might be partially displayed.
To see the complete results, select a more granular scope with fewer child subscriptions.
Azure Policy supports many ARM template functions and functions that are available only in a policy definition. Resource Manager processes these functions as part of a deployment instead of as part of a policy definition.
Using supported functions, such as parameter()
or resourceGroup()
, results in the processed outcome of the function at deployment time instead of allowing the function for the policy definition and Azure Policy engine to process.
To pass a function through as part of a policy definition, escape the entire string with [
such that the property looks like [[resourceGroup().tags.myTag]
. The escape character causes Resource Manager to treat the value as a string when it processes the template. Azure Policy then places the function into the policy definition, which allows it to be dynamic as expected. For more information, see Syntax and expressions in Azure Resource Manager templates.
The helm install azure-policy-addon
command fails, and it returns one of the following errors:
!: event not found
Error: failed parsing --set data: key "<key>" has no value (cannot end with ,)
The generated password includes a comma (,
), which the Helm Chart is splitting on.
When you run helm install azure-policy-addon
, escape the comma (,
) in the password value with a backslash (\
).
The helm install azure-policy-addon
command fails, and it returns the following error:
Error: cannot re-use a name that is still in use
The Helm Chart with the name azure-policy-addon
was already installed or partially installed.
Follow the instructions to remove the Azure Policy for Kubernetes add-on, then rerun the helm install azure-policy-addon
command.
Scenario: Azure virtual machine user-assigned identities are replaced by system-assigned managed identities
After you assign Guest Configuration policy initiatives to audit settings inside a machine, the user-assigned managed identities that were assigned to the machine are no longer assigned. Only a system-assigned managed identity is assigned.
The policy definitions that were previously used in Guest Configuration deployIfNotExists
definitions ensured that a system-assigned identity is assigned to the machine. But they also removed the user-assigned identity assignments.
The definitions that previously caused this issue appear as \[Deprecated\]
, and are replaced by policy definitions that manage prerequisites without removing user-assigned managed identities. A manual step is required. Delete any existing policy assignments that are marked as \[Deprecated\]
, and replace them with the updated prerequisite policy initiative and policy definitions that have the same name as the original.
For a detailed narrative, see the blog post Important change released for Guest Configuration audit policies.
Scenario: The add-on is unable to reach the Azure Policy service endpoint because of egress restrictions
The add-on can't reach the Azure Policy service endpoint, and it returns one of the following errors:
failed to fetch token, service not reachable
Error getting file "Get https://raw.githubusercontent.com/Azure/azure-policy/master/built-in-references/Kubernetes/container-allowed-images/template.yaml: dial tcp 151.101.228.133.443: connect: connection refused
This issue occurs when a cluster egress is locked down.
Ensure that the domains and ports mentioned in the following article are open:
Scenario: The add-on is unable to reach the Azure Policy service endpoint because of the aad-pod-identity configuration
The add-on can't reach the Azure Policy service endpoint, and it returns one of the following errors:
azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://gov-prod-policy-data.trafficmanager.cn/checkDataPolicyCompliance?api-version=2019-01-01-preview: StatusCode=404
adal: Refresh request failed. Status Code = '404'. Response body: getting assigned identities for pod kube-system/azure-policy-8c785548f-r882p in CREATED state failed after 16 attempts, retry duration [5]s, error: <nil>
This error occurs when aad-pod-identity
is installed on the cluster and the kube-system pods aren't excluded in aad-pod-identity
.
The aad-pod-identity
component Node Managed Identity (NMI) pods modify the nodes' iptables to intercept calls to the Azure instance metadata endpoint. This setup means that any request made to the metadata endpoint is intercepted by NMI, even if the pod doesn't use aad-pod-identity
. The AzurePodIdentityException
CustomResourceDefinition (CRD) can be configured to inform aad-pod-identity
that any requests to a metadata endpoint that originate from a pod matching the labels defined in the CRD should be proxied without any processing in NMI.
Exclude the system pods that have the kubernetes.azure.com/managedby: aks
label in kube-system namespace in aad-pod-identity
by configuring the AzurePodIdentityException
CRD.
For more information, see Disable the Azure Active Directory (Azure AD) pod identity for a specific pod/application.
To configure an exception, follow this example:
apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzurePodIdentityException
metadata:
name: mic-exception
namespace: default
spec:
podLabels:
app: mic
component: mic
---
apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzurePodIdentityException
metadata:
name: aks-addon-exception
namespace: kube-system
spec:
podLabels:
kubernetes.azure.com/managedby: aks
The add-on can reach the Azure Policy service endpoint, but the add-on logs display one of the following errors:
The resource provider 'Microsoft.PolicyInsights' is not registered in subscription '{subId}'. See https://aka.ms/policy-register-subscription for how to register subscriptions.
policyinsightsdataplane.BaseClient#CheckDataPolicyCompliance: Failure responding to request: StatusCode=500 -- Original Error: autorest/azure: Service returned an error. Status=500 Code="InternalServerError" Message="Encountered an internal server error.
The Microsoft.PolicyInsights
resource provider isn't registered. It must be registered for the add-on to get policy definitions and return compliance data.
Register the Microsoft.PolicyInsights
resource provider in the cluster subscription. For instructions, see Register a resource provider.
The add-on can reach the Azure Policy service endpoint, but the following error is displayed:
The subscription '{subId}' has been disabled for azure data-plane policy. Please contact support.
This error means that the subscription was determined to be problematic, and the feature flag Microsoft.PolicyInsights/DataPlaneBlocked
was added to block the subscription.
To investigate and resolve this issue, contact the feature team.
When attempting to create a custom policy definition from the Azure portal page for policy definitions, you select the Duplicate definition button. After assigning the policy, you find machines are NonCompliant because no guest configuration assignment resource exists.
Guest configuration relies on custom metadata added to policy definitions when creating guest configuration assignment resources. The Duplicate definition activity in the Azure portal doesn't copy custom metadata.
Instead of using the portal, duplicate the policy definition using the Policy Insights API. The following PowerShell sample provides an option.
# duplicates the built-in policy which audits Windows machines for pending reboots
$def = Get-AzPolicyDefinition -id '/providers/Microsoft.Authorization/policyDefinitions/4221adbc-5c0f-474f-88b7-037a99e6114c' | % Properties
New-AzPolicyDefinition -name (new-guid).guid -DisplayName "$($def.DisplayName) (Copy)" -Description $def.Description -Metadata ($def.Metadata | convertto-json) -Parameter ($def.Parameters | convertto-json) -Policy ($def.PolicyRule | convertto-json -depth 15)
Scenario: Kubernetes resource gets created during connectivity failure despite deny policy being assigned
If there's a Kubernetes cluster connectivity failure, evaluation for newly created or updated resources might be bypassed due to Gatekeeper's fail-open behavior.
The GK fail-open model is by design and based on community feedback. Gatekeeper documentation expands on these reasons here: https://open-policy-agent.github.io/gatekeeper/website/docs/failing-closed#considerations.
In the prior event, the error case can be monitored from the admission webhook metrics provided by the kube-apiserver
. If evaluation is bypassed at creation time and an object is created, it's reported on Azure Policy compliance as non-compliant as a flag to customers.
Regardless of the scenario, Azure policy retains the last known policy on the cluster and keeps the guardrails in place.
If your problem isn't listed in this article or you can't resolve it, get support by visiting one of the following channels:
Get answers from experts through Microsoft Q&A.
If you still need help, go to the Azure support site and submit your request.