Filter container log collection with ConfigMap

Kubernetes clusters generate a large amount of data that's collected by Azure Monitor. Since you're charged for the ingestion and retention of this data, you can significantly reduce your monitoring costs by filtering out data that you don't need.

ConfigMaps are a Kubernetes mechanism that allows you to store non-confidential data such as a configuration file or environment variables. Container insights looks for a ConfigMap on each cluster with particular settings that define data that it should collect.

Tip

Before implementing any of the filtering options described in this article, ensure that you select a log collection profile that matches your requirements. Use the information in this article to further refine the data collection settings for your cluster.

Prerequisites

  • The minimum agent version supported to collect stdout, stderr, and environmental variables from container workloads is ciprod06142019 or later.

Configure and deploy ConfigMap

Use the following procedure to configure and deploy your ConfigMap configuration file to your cluster:

  1. If you don't already have a ConfigMap for Container insights, download the template ConfigMap YAML file and open it in an editor.

  2. Edit the ConfigMap YAML file with your customizations. The template includes all valid settings with descriptions. To enable a setting, remove the comment character (#) and set its value.

  3. Create a ConfigMap by running the following kubectl command:

    kubectl config set-context <cluster-name>
    kubectl apply -f <configmap_yaml_file.yaml>
    
    # Example: 
    kubectl config set-context my-cluster
    kubectl apply -f container-azm-ms-agentconfig.yaml
    

    The configuration change can take a few minutes to finish before taking effect. Then all Azure Monitor Agent pods in the cluster will restart. The restart is a rolling restart for all Azure Monitor Agent pods, so not all of them restart at the same time. When the restarts are finished, you'll receive a message similar to the following result:

    configmap "container-azm-ms-agentconfig" created`.
    

Verify configuration

To verify the configuration was successfully applied to a cluster, use the following command to review the logs from an agent pod.

kubectl logs ama-logs-fdf58 -n kube-system -c ama-logs

If there are configuration errors from the Azure Monitor Agent pods, the output will show errors similar to the following:

***************Start Config Processing******************** 
config::unsupported/missing config schema version - 'v21' , using defaults

Use the following options to perform more troubleshooting of configuration changes:

  • Use the same kubectl logs command from an agent pod.

  • Review live logs for errors similar to the following:

    config::error::Exception while parsing config map for log collection/env variable settings: \nparse error on value \"$\" ($end), using defaults, please check config map for errors
    

Data is sent to the KubeMonAgentEvents table in your Log Analytics workspace every hour with error severity for configuration errors. If there are no errors, the entry in the table will have data with severity info, which reports no errors. The Tags column contains more information about the pod and container ID on which the error occurred and also the first occurrence, last occurrence, and count in the last hour.

Verify schema version

Supported config schema versions are available as pod annotation (schema-versions) on the Azure Monitor Agent pod. You can see them with the following kubectl command.

kubectl describe pod ama-logs-fdf58 -n=kube-system.

Filter container logs

Container logs are stderr and stdout logs generated by containers in your Kubernetes cluster. These logs are stored in the ContainerLogV2 table in your Log Analytics workspace. By default all container logs are collected, but you can filter out logs from specific namespaces or disable collection of container logs entirely.

Using ConfigMap, you can configure the collection of stderr and stdout logs separately for the cluster, so you can choose to enable one and not the other. The following example shows the ConfigMap settings to collect stdout and stderr excluding the kube-system and gatekeeper-system namespaces.

[log_collection_settings]
    [log_collection_settings.stdout]
        enabled = true
        exclude_namespaces = ["kube-system","gatekeeper-system"]

    [log_collection_settings.stderr]
        enabled = true
        exclude_namespaces = ["kube-system","gatekeeper-system"]

    [log_collection_settings.enrich_container_logs]
        enabled = true

Note

You can also configure namespace filtering using the Data collection rule (DCR) for the cluster, but this doesn't apply to data sent to ContainerLogV2. This data can only be filtered using the ConfigMap.

Platform log filtering (System Kubernetes namespaces)

By default, container logs from the system namespace are excluded from collection to minimize the Log Analytics cost. Container logs of system containers can be critical though in specific troubleshooting scenarios. This feature is restricted to the following system namespaces: kube-system, gatekeeper-system, calico-system, azure-arc, kube-public, and kube-node-lease.

Enable platform logs using ConfigMap with the collect_system_pod_logs setting. You must also ensure that the system namespace is not in the exclude_namespaces setting.

The following example shows the ConfigMap settings to collect stdout and stderr logs of coredns container in the kube-system namespace.

[log_collection_settings]
    [log_collection_settings.stdout]
        enabled = true
        exclude_namespaces = ["gatekeeper-system"]
        collect_system_pod_logs = ["kube-system:coredns"]

    [log_collection_settings.stderr]
        enabled = true
        exclude_namespaces = ["kube-system","gatekeeper-system"]
        collect_system_pod_logs = ["kube-system:coredns"]

Annotation based filtering for workloads

Annotation-based filtering enables you to exclude log collection for certain pods and containers by annotating the pod. This can reduce your logs ingestion cost significantly and allow you to focus on relevant information without sifting through noise.

Enable annotation based filtering using ConfigMap with the following settings.

[log_collection_settings.filter_using_annotations]
   enabled = true

You must also add the required annotations on your workload pod spec. The following table highlights different possible pod annotations.

Annotation Description
fluentbit.io/exclude: "true" Excludes both stdout & stderr streams on all the containers in the Pod
fluentbit.io/exclude_stdout: "true" Excludes only stdout stream on all the containers in the Pod
fluentbit.io/exclude_stderr: "true" Excludes only stderr stream on all the containers in the Pod
fluentbit.io/exclude_container1: "true" Exclude both stdout & stderr streams only for the container1 in the pod
fluentbit.io/exclude_stdout_container1: "true" Exclude only stdout only for the container1 in the pod

Note

These annotations are fluent bit based. If you use your own fluent-bit based log collection solution with the Kubernetes plugin filter and annotation based exclusion, it will stop collecting logs from both Container Insights and your solution.

Following is an example of fluentbit.io/exclude: "true" annotation in a Pod spec:

apiVersion: v1 
kind: Pod 
metadata: 
 name: apache-logs 
 labels: 
  app: apache-logs 
 annotations: 
  fluentbit.io/exclude: "true" 
spec: 
 containers: 
 - name: apache 
  image: edsiper/apache_logs 

Filter environment variables

Enable collection of environment variables across all pods and nodes in the cluster using ConfigMap with the following settings.

[log_collection_settings.env_var]
    enabled = true

If collection of environment variables is globally enabled, you can disable it for a specific container by setting the environment variable AZMON_COLLECT_ENV to False either with a Dockerfile setting or in the configuration file for the Pod under the env: section. If collection of environment variables is globally disabled, you can't enable collection for a specific container. The only override that can be applied at the container level is to disable collection when it's already enabled globally.

ConfigMap settings

The following table describes the settings you can configure to control data collection with ConfigMap.

Setting Data type Value Description
schema-version String (case sensitive) v1 Used by the agent when parsing this ConfigMap. Currently supported schema-version is v1. Modifying this value isn't supported and will be rejected when the ConfigMap is evaluated.
config-version String Allows you to keep track of this config file's version in your source control system/repository. Maximum allowed characters are 10, and all other characters are truncated.
[log_collection_settings]
[stdout]
enabled
Boolean true
false
Controls whether stdout container log collection is enabled. When set to true and no namespaces are excluded for stdout log collection, stdout logs will be collected from all containers across all pods and nodes in the cluster. If not specified in the ConfigMap, the default value is true.
[stdout]
exclude_namespaces
String Comma-separated array Array of Kubernetes namespaces for which stdout logs won't be collected. This setting is effective only if enabled is set to true. If not specified in the ConfigMap, the default value is
["kube-system","gatekeeper-system"].
[stderr]
enabled
Boolean true
false
Controls whether stderr container log collection is enabled. When set to true and no namespaces are excluded for stderr log collection, stderr logs will be collected from all containers across all pods and nodes in the cluster. If not specified in the ConfigMap, the default value is true.
[stderr]
exclude_namespaces
String Comma-separated array Array of Kubernetes namespaces for which stderr logs won't be collected. This setting is effective only if enabled is set to true. If not specified in the ConfigMap, the default value is
["kube-system","gatekeeper-system"].
[env_var]
enabled
Boolean true
false
Controls environment variable collection across all pods and nodes in the cluster. If not specified in the ConfigMap, the default value is true.
[enrich_container_logs]
enabled
Boolean true
false
Controls container log enrichment to populate the Name and Image property values for every log record written to the ContainerLog table for all container logs in the cluster. If not specified in the ConfigMap, the default value is false.
[collect_all_kube_events]
enabled
Boolean true
false
Controls whether Kube events of all types are collected. By default, the Kube events with type Normal aren't collected. When this setting is true, the Normal events are no longer filtered, and all events are collected. If not specified in the ConfigMap, the default value is false.
[schema]
containerlog_schema_version
String (case sensitive) v2
v1
Sets the log ingestion format. If v2, the ContainerLogV2 table is used. If v1, the ContainerLog table is used (this table has been deprecated). For clusters enabling container insights using Azure CLI version 2.54.0 or greater, the default setting is v2. See Container insights log schema for details.
[enable_multiline_logs]
enabled
Boolean true
false
Controls whether multiline container logs are enabled. See Multi-line logging in Container Insights for details. If not specified in the ConfigMap, the default value is false. This requires the schema setting to be v2.
[metadata_collection]
enabled
Boolean true
false
Controls whether metadata is collected in the KubernetesMetadata column of the ContainerLogV2 table.
[metadata_collection]
include_fields
String Comma-separated array List of metadata fields to include. If the setting isn't used then all fields are collected. Valid values are ["podLabels","podAnnotations","podUid","image","imageID","imageRepo","imageTag"]
[log_collection_settings.multi_tenancy]
enabled
Boolean true
false
Controls whether multitenancy is enabled. See Multitenant managed logging for details. If not specified in the ConfigMap, the default value is false.
[metric_collection_settings]
[collect_kube_system_pv_metrics]
enabled
Boolean true
false
Allows persistent volume (PV) usage metrics to be collected in the kube-system namespace. By default, usage metrics for persistent volumes with persistent volume claims in the kube-system namespace aren't collected. When this setting is set to true, PV usage metrics for all namespaces are collected. If not specified in the ConfigMap, the default value is false.
[agent_settings]
[proxy_config]
ignore_proxy_settings
Boolean true
false
When true, proxy settings are ignored. For both AKS and Arc-enabled Kubernetes environments, if your cluster is configured with forward proxy, then proxy settings are automatically applied and used for the agent. For certain configurations, such as with AMPLS + Proxy, you might want the proxy configuration to be ignored. If not specified in the ConfigMap, the default value is false.
[agent_settings.fbit_config]
enable_internal_metrics Boolean true
false
Controls whether collection of internal metrics are enabled. If not specified in the ConfigMap, the default value is false.

Impact on visualizations and alerts

If you have any custom alerts or workbooks using Container insights data, then modifying your data collection settings might degrade those experiences. If you're excluding namespaces or reducing data collection frequency, review your existing alerts, dashboards, and workbooks using this data.

To scan for alerts that reference these tables, run the following Azure Resource Graph query:

resources
| where type in~ ('microsoft.insights/scheduledqueryrules') and ['kind'] !in~ ('LogToMetric')
| extend severity = strcat("Sev", properties["severity"])
| extend enabled = tobool(properties["enabled"])
| where enabled in~ ('true')
| where tolower(properties["targetResourceTypes"]) matches regex 'microsoft.operationalinsights/workspaces($|/.*)?' or tolower(properties["targetResourceType"]) matches regex 'microsoft.operationalinsights/workspaces($|/.*)?' or tolower(properties["scopes"]) matches regex 'providers/microsoft.operationalinsights/workspaces($|/.*)?'
| where properties contains "Perf" or properties  contains "InsightsMetrics" or properties  contains "ContainerInventory" or properties  contains "ContainerNodeInventory" or properties  contains "KubeNodeInventory" or properties  contains"KubePodInventory" or properties  contains "KubePVInventory" or properties  contains "KubeServices" or properties  contains "KubeEvents" 
| project id,name,type,properties,enabled,severity,subscriptionId
| order by tolower(name) asc

Next steps