Understand Azure Policy for Kubernetes clusters

Azure Policy extends Gatekeeper v3, an admission controller webhook for Open Policy Agent (OPA), to apply at-scale enforcements and safeguards on your clusters in a centralized, consistent manner. Azure Policy makes it possible to manage and report on the compliance state of your Kubernetes clusters from one place. The add-on enacts the following functions:

  • Checks with Azure Policy service for policy assignments to the cluster.
  • Deploys policy definitions into the cluster as constraint template and constraint custom resources.
  • Reports auditing and compliance details back to Azure Policy service.

Azure Policy for Kubernetes supports the following cluster environments:

Important

The add-ons for AKS Engine are in preview. Azure Policy for Kubernetes only supports Linux node pools and built-in policy definitions (custom policy definitions is a public preview feature). Built-in policy definitions are in the Kubernetes category. The limited preview policy definitions with EnforceOPAConstraint and EnforceRegoPolicy effect and the related Kubernetes Service category are deprecated. Instead, use the effects audit and deny with Resource Provider mode Microsoft.Kubernetes.Data.

Overview

To enable and use Azure Policy with your Kubernetes cluster, take the following actions:

  1. Configure your Kubernetes cluster and install the add-on:

    Note

    For common issues with installation, see Troubleshoot - Azure Policy Add-on.

  2. Understand the Azure Policy language for Kubernetes

  3. Assign a definition to your Kubernetes cluster

  4. Wait for validation

Limitations

The following general limitations apply to the Azure Policy Add-on for Kubernetes clusters:

  • Azure Policy Add-on for Kubernetes is supported on Kubernetes version 1.14 or higher.

  • Azure Policy Add-on for Kubernetes can only be deployed to Linux node pools.

  • Only built-in policy definitions are supported. Custom policy definitions are a public preview feature.

  • Maximum number of pods supported by the Azure Policy Add-on: 10,000

  • Maximum number of Non-compliant records per policy per cluster: 500

  • Maximum number of Non-compliant records per subscription: 1 million

  • Installations of Gatekeeper outside of the Azure Policy Add-on aren't supported. Uninstall any components installed by a previous Gatekeeper installation before enabling the Azure Policy Add-on.

  • Reasons for non-compliance aren't available for the Microsoft.Kubernetes.Data Resource Provider mode. Use Component details.

  • Component-level exemptions aren't supported for Resource Provider modes.

The following limitations apply only to the Azure Policy Add-on for AKS:

  • Namespaces automatically excluded by Azure Policy Add-on for evaluation: kube-system, gatekeeper-system, and aks-periscope.

Recommendations

The following are general recommendations for using the Azure Policy Add-on:

  • The Azure Policy Add-on requires three Gatekeeper components to run: One audit pod and two webhook pod replicas. These components consume more resources as the count of Kubernetes resources and policy assignments increases in the cluster, which requires audit and enforcement operations.

    • For fewer than 500 pods in a single cluster with a max of 20 constraints: two vCPUs and 350 MB memory per component.
    • For more than 500 pods in a single cluster with a max of 40 constraints: three vCPUs and 600 MB memory per component.
  • Windows pods don't support security contexts. Thus, some of the Azure Policy definitions, such as disallowing root privileges, can't be escalated in Windows pods and only apply to Linux pods.

The following recommendation applies only to AKS and the Azure Policy Add-on:

  • Use system node pool with CriticalAddonsOnly taint to schedule Gatekeeper pods. For more information, see Using system node pools.
  • Secure outbound traffic from your AKS clusters. For more information, see Control egress traffic for cluster nodes.
  • If the cluster has aad-pod-identity enabled, Node Managed Identity (NMI) pods modify the nodes' iptables to intercept calls to the Azure Instance Metadata endpoint. This configuration means any request made to the Metadata endpoint is intercepted by NMI even if the pod doesn't use aad-pod-identity. AzurePodIdentityException CRD can be configured to inform aad-pod-identity that any requests to the Metadata endpoint originating from a pod that matches labels defined in CRD should be proxied without any processing in NMI. The system pods with kubernetes.azure.com/managedby: aks label in kube-system namespace should be excluded in aad-pod-identity by configuring the AzurePodIdentityException CRD. For more information, see Disable aad-pod-identity for a specific pod or application. To configure an exception, install the mic-exception YAML.

Install Azure Policy Add-on for AKS

Before installing the Azure Policy Add-on or enabling any of the service features, your subscription must enable the Microsoft.PolicyInsights resource providers.

  1. You need the Azure CLI version 2.12.0 or later installed and configured. Run az --version to find the version. If you need to install or upgrade, see Install the Azure CLI.

  2. Register the resource providers and preview features.

    • Azure portal:

      Register the Microsoft.PolicyInsights resource providers. For steps, see Resource providers and types.

    • Azure CLI:

      # Log in first with az login
      az cloud set -n AzureChinaCloud
      az login
      
      # Provider register: Register the Azure Policy provider
      az provider register --namespace Microsoft.PolicyInsights
      
  3. If limited preview policy definitions were installed, remove the add-on with the Disable button on your AKS cluster under the Policies page.

  4. The AKS cluster must be version 1.14 or higher. Use the following script to validate your AKS cluster version:

    # Log in first with az login
    az cloud set -n AzureChinaCloud
    az login
    
    # Look for the value in kubernetesVersion
    az aks list
    
  5. Install version 2.12.0 or higher of the Azure CLI. For more information, see Install the Azure CLI.

Once the above prerequisite steps are completed, install the Azure Policy Add-on in the AKS cluster you want to manage.

  • Azure portal

    1. Launch the AKS service in the Azure portal by selecting All services, then searching for and selecting Kubernetes services.

    2. Select one of your AKS clusters.

    3. Select Policies on the left side of the Kubernetes service page.

    4. In the main page, select the Enable add-on button.

  • Azure CLI

    # Log in first with az login
    az cloud set -n AzureChinaCloud
    az login
    
    az aks enable-addons --addons azure-policy --name MyAKSCluster --resource-group MyResourceGroup
    

To validate that the add-on installation was successful and that the azure-policy and gatekeeper pods are running, run the following command:

# azure-policy pod is installed in kube-system namespace
kubectl get pods -n kube-system

# gatekeeper pod is installed in gatekeeper-system namespace
kubectl get pods -n gatekeeper-system

Lastly, verify that the latest add-on is installed by running this Azure CLI command, replacing <rg> with your resource group name and <cluster-name> with the name of your AKS cluster: az aks show --query addonProfiles.azurepolicy -g <rg> -n <cluster-name>. The result should look similar to the following output:

{
        "config": null,
        "enabled": true,
        "identity": null
}

Policy language

The Azure Policy language structure for managing Kubernetes follows that of existing policy definitions. With a Resource Provider mode of Microsoft.Kubernetes.Data, the effects audit and deny are used to manage your Kubernetes clusters. Audit and deny must provide details properties specific to working with OPA Constraint Framework and Gatekeeper v3.

As part of the details.templateInfo, details.constraint, or details.constraintTemplate properties in the policy definition, Azure Policy passes the URI or Base64Encoded value of these CustomResourceDefinitions (CRD) to the add-on. Rego is the language that OPA and Gatekeeper support to validate a request to the Kubernetes cluster. By supporting an existing standard for Kubernetes management, Azure Policy makes it possible to reuse existing rules and pair them with Azure Policy for a unified cloud compliance reporting experience. For more information, see What is Rego?.

Assign a policy definition

To assign a policy definition to your Kubernetes cluster, you must be assigned the appropriate Azure role-based access control (Azure RBAC) policy assignment operations. The Azure built-in roles Resource Policy Contributor and Owner have these operations. To learn more, see Azure RBAC permissions in Azure Policy.

Note

Custom policy definitions is a public preview feature.

Find the built-in policy definitions for managing your cluster using the Azure portal with the following steps. If using a custom policy definition, search for it by name or the category that you created it with.

  1. Start the Azure Policy service in the Azure portal. Select All services in the left pane and then search for and select Policy.

  2. In the left pane of the Azure Policy page, select Definitions.

  3. From the Category dropdown list box, use Select all to clear the filter and then select Kubernetes.

  4. Select the policy definition, then select the Assign button.

  5. Set the Scope to the management group, subscription, or resource group of the Kubernetes cluster where the policy assignment will apply.

    Note

    When assigning the Azure Policy for Kubernetes definition, the Scope must include the cluster resource. For an AKS Engine cluster, the Scope must be the resource group of the cluster.

  6. Give the policy assignment a Name and Description that you can use to identify it easily.

  7. Set the Policy enforcement to one of the values below.

    • Enabled - Enforce the policy on the cluster. Kubernetes admission requests with violations are denied.

    • Disabled - Don't enforce the policy on the cluster. Kubernetes admission requests with violations aren't denied. Compliance assessment results are still available. When rolling out new policy definitions to running clusters, Disabled option is helpful for testing the policy definition as admission requests with violations aren't denied.

  8. Select Next.

  9. Set parameter values

    • To exclude Kubernetes namespaces from policy evaluation, specify the list of namespaces in parameter Namespace exclusions. It's recommended to exclude: kube-system, gatekeeper-system, and azure-arc.
  10. Select Review + create.

Alternately, use the Assign a policy - Portal quickstart to find and assign a Kubernetes policy. Search for a Kubernetes policy definition instead of the sample 'audit vms'.

Important

Built-in policy definitions are available for Kubernetes clusters in category Kubernetes. For a list of built-in policy definitions, see Kubernetes samples.

Policy evaluation

The add-on checks in with Azure Policy service for changes in policy assignments every 15 minutes. During this refresh cycle, the add-on checks for changes. These changes trigger creates, updates, or deletes of the constraint templates and constraints.

In a Kubernetes cluster, if a namespace has the cluster-appropriate label, the admission requests with violations aren't denied. Compliance assessment results are still available.

  • Azure Kubernetes Service cluster: control-plane

Note

While a cluster admin may have permission to create and update constraint templates and constraints resources install by the Azure Policy Add-on, these aren't supported scenarios as manual updates are overwritten. Gatekeeper continues to evaluate policies that existed prior to installing the add-on and assigning Azure Policy policy definitions.

Every 15 minutes, the add-on calls for a full scan of the cluster. After gathering details of the full scan and any real-time evaluations by Gatekeeper of attempted changes to the cluster, the add-on reports the results back to Azure Policy for inclusion in compliance details like any Azure Policy assignment. Only results for active policy assignments are returned during the audit cycle. Audit results can also be seen as violations listed in the status field of the failed constraint. For details on Non-compliant resources, see Component details for Resource Provider modes.

Note

Each compliance report in Azure Policy for your Kubernetes clusters include all violations within the last 45 minutes. The timestamp indicates when a violation occurred.

Some other considerations:

  • If the cluster subscription is registered with Azure Security Center, then Azure Security Center Kubernetes policies are applied on the cluster automatically.

  • When a deny policy is applied on cluster with existing Kubernetes resources, any pre-existing resource that is not compliant with the new policy continues to run. When the non-compliant resource gets rescheduled on a different node the Gatekeeper blocks the resource creation.

  • When a cluster has a deny policy that validates resources, the user will not see a rejection message when creating a deployment. For example, consider a Kubernetes deployment that contains replicasets and pods. When a user executes kubectl describe deployment $MY_DEPLOYMENT, it does not return a rejection message as part of events. However, kubectl describe replicasets.apps $MY_DEPLOYMENT returns the events associated with rejection.

Note

Init containers may be included during policy evaluation. To see if init containers are included, review the CRD for the following or a similar declaration:

input_containers[c] { 
   c := input.review.object.spec.initContainers[_] 
}

Constraint template conflicts

If constraint templates have the same resource metadata name, but the policy definition references the source at different locations, the policy definitions are considered to be in conflict. Example: Two policy definitions reference the same template.yaml file stored at different source locations such as the Azure Policy template store (store.policy.core.chinacloudapi.cn) and GitHub.

When policy definitions and their constraint templates are assigned but aren't already installed on the cluster and are in conflict, they are reported as a conflict and won't be installed into the cluster until the conflict is resolved. Likewise, any existing policy definitions and their constraint templates that are already on the cluster that conflict with newly assigned policy definitions continue to function normally. If an existing assignment is updated and there is a failure to sync the constraint template, the cluster is also marked as a conflict. For all conflict messages, see AKS Resource Provider mode compliance reasons

Logging

As a Kubernetes controller/container, both the azure-policy and gatekeeper pods keep logs in the Kubernetes cluster. The logs can be exposed in the Insights page of the Kubernetes cluster. For more information, see Monitor your Kubernetes cluster performance with Azure Monitor for containers.

To view the add-on logs, use kubectl:

# Get the azure-policy pod name installed in kube-system namespace
kubectl logs <azure-policy pod name> -n kube-system

# Get the gatekeeper pod name installed in gatekeeper-system namespace
kubectl logs <gatekeeper pod name> -n gatekeeper-system

For more information, see Debugging Gatekeeper in the Gatekeeper documentation.

Troubleshooting the add-on

For more information about troubleshooting the Add-on for Kubernetes, see the Kubernetes section of the Azure Policy troubleshooting article.

Remove the add-on

Remove the add-on from AKS

To remove the Azure Policy Add-on from your AKS cluster, use either the Azure portal or Azure CLI:

  • Azure portal

    1. Launch the AKS service in the Azure portal by selecting All services, then searching for and selecting Kubernetes services.

    2. Select your AKS cluster where you want to disable the Azure Policy Add-on.

    3. Select Policies on the left side of the Kubernetes service page.

    4. In the main page, select the Disable add-on button.

  • Azure CLI

    # Log in first with az login
    az cloud set -n AzureChinaCloud
    az login
    
    az aks disable-addons --addons azure-policy --name MyAKSCluster --resource-group MyResourceGroup
    

Diagnostic data collected by Azure Policy Add-on

The Azure Policy Add-on for Kubernetes collects limited cluster diagnostic data. This diagnostic data is vital technical data related to software and performance. It's used in the following ways:

  • Keep Azure Policy Add-on up to date
  • Keep Azure Policy Add-on secure, reliable, performant
  • Improve Azure Policy Add-on - through the aggregate analysis of the use of the add-on

The information collected by the add-on isn't personal data. The following details are currently collected:

  • Azure Policy Add-on agent version

  • Cluster type

  • Cluster region

  • Cluster resource group

  • Cluster resource ID

  • Cluster subscription ID

  • Cluster OS (Example: Linux)

  • Cluster city

  • Cluster state or province

  • Cluster country or region

  • Exceptions/errors encountered by Azure Policy Add-on during agent installation on policy evaluation

  • Number of Gatekeeper policy definitions not installed by Azure Policy Add-on

Next steps