Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article provides troubleshooting guidance for various CoreDNS issues on Azure Kubernetes Service (AKS).
Debug DNS resolution issues
For general CoreDNS troubleshooting steps, such as checking the endpoints or resolution, see Debugging DNS resolution.
Troubleshoot CoreDNS pod traffic imbalance
You might see one or two CoreDNS pods showing significantly higher CPU usage and handling more DNS queries than others, even with multiple CoreDNS pods running in your AKS cluster. This is a known issue in Kubernetes and can lead to one of the CoreDNS pods being overloaded and crashing.
This uneven distribution of DNS queries is primarily caused by User Datagram Protocol (UDP) load balancing limitations in Kubernetes. The platform uses a five-tuple hash (source IP, source port, destination IP, destination port, protocol) to distribute UDP traffic, so if an application reuses the same source port for DNS queries, all queries from that client are routed to the same CoreDNS pod. This distribution method can result in a single pod handling a disproportionate amount of traffic. Additionally, some applications use connection pooling and reuse DNS connections. This behavior can further concentrate DNS queries on a single CoreDNS pod, increasing the imbalance and the risk of overloading and potential crashes.
The following sections help you troubleshoot and mitigate this issue.
Enable DNS query logging
Enable DNS query logging to capture required DNS query logs from CoreDNS pods.
Add the following configuration to your
coredns-customConfigMap:apiVersion: v1 kind: ConfigMap metadata: name: coredns-custom namespace: kube-system data: log.override: | # You can select any name here, but it must end with the .override file extension logApply the ConfigMap changes using the
kubectl apply configmapcommand.kubectl apply -f corednsms.yamlPerform a rolling restart to reload the ConfigMap and enable the Kubernetes Scheduler to restart CoreDNS without downtime using the
kubectl rollout restartcommand.kubectl --namespace kube-system rollout restart deployment corednsView the CoreDNS debug logging using the
kubectl logscommand.kubectl logs --namespace kube-system -l k8s-app=kube-dns
Check your CoreDNS pod traffic distribution
Get the names of all CoreDNS pods in your cluster using the
kubectl get podscommand.kubectl get pods --namespace kube-system -l k8s-app=kube-dnsReview the logs for each CoreDNS pod to analyze DNS query patterns using the
kubectl logscommand. Repeat this command for all CoreDNS pods, replacing<coredns-pod-x>with the actual pod names.kubectl logs --namespace kube-system <coredns-pod-x>In the outputs, look for repeated client IP addresses and ports that appear only in the logs of a single CoreDNS pod. This indicates that DNS queries from certain clients aren't being distributed evenly.
Example log output:
[INFO] 10.244.0.247:5556 - 42621 "A IN myservice.default.svc.cluster.local. udp 28" NOERROR qr,aa,rd 106 0.000141sIn this example log entry:
10.244.0.247is the client IP address making the DNS query.5556is the client source port.42621is the query ID.
If you see the same client IP and port repeatedly in only one pod's logs, this confirms a traffic imbalance.
Mitigate CoreDNS pod traffic imbalance
If you notice an imbalance, your application could be reusing UDP source ports or pooling their connections. Based on the root cause, you can take the following mitigation actions:
- Caused by UDP source port reuse: UDP source port reuse occurs when a client application sends multiple DNS queries from the same UDP source port. If this is the issue, update your applications or DNS clients to randomize source ports for each DNS query, which helps distribute requests more evenly across pods.
- Caused by connection pooling: Connection pools are mechanisms applications use to reuse existing network connections instead of creating a new connection for each request. While this improves efficiency, it can result in all DNS queries from an application being sent over the same connection, and thus routed to the same CoreDNS pod. To mitigate this, adjust your application's DNS connection handling by reducing connection Time to Live (TTL) or randomizing connection creation, ensuring queries aren't concentrated on a single CoreDNS pod.
These changes can help achieve a more balanced DNS query distribution and reduce the risk of overloading individual pods.
Troubleshoot invalid search domain completions for internal.chinacloudapp.cn and reddog.microsoft.com
Azure DNS configures a default search domain of <VNET_ID>.<REGION>.internal.chinacloudapp.cn in virtual networks (VNets) using Azure DNS and a nonfunctional stub reddog.microsoft.com in VNets using custom DNS servers. For more information, see the Name resolution for resources documentation.
Kubernetes configures pod DNS settings with ndots: 5 to properly support cluster service hostname resolution. These two configurations combine to result in invalid search domain completion queries that never succeed being sent to upstream name servers while the system processes through the domain search list. These invalid queries cause name resolution delays and can place extra load on upstream DNS servers.
As of the v20241025 AKS release, AKS configures CoreDNS to respond with NXDOMAIN in the following cases in order to prevent these invalid search domain completion queries from being forwarded to upstream DNS:
- Any query for the root domain or a subdomain of
reddog.microsoft.com. - Any query for a subdomain of
internal.chinacloudapp.cnthat has seven or more labels in the domain name.- This configuration allows virtual machine (VM) resolution by hostname to still succeed. For example, CoreDNS sends
aks12345.myvnetid.myregion.internal.chinacloudapp.cn(six labels) to Azure DNS, but rejectsmcr.azk8s.cn.myvnetid.myregion.internal.chinacloudapp.cn(eight labels).
- This configuration allows virtual machine (VM) resolution by hostname to still succeed. For example, CoreDNS sends
This block is implemented in the default server block in the CoreFile for the cluster. If needed, you can disable this rejection configuration by creating custom server blocks for the appropriate domain with a forward plugin enabled:
Create a file named
corednsms.yamland paste in the following example configuration. Make sure to update the IP addresses and hostnames with your own values.apiVersion: v1 kind: ConfigMap metadata: name: coredns-custom # This is the name of the ConfigMap you can overwrite with your changes namespace: kube-system data: override-block.server: internal.chinacloudapp.cn:53 { errors cache 30 forward . /etc/resolv.conf } reddog.microsoft.com:53 { errors cache 30 forward . /etc/resolv.conf }Create the ConfigMap using the
kubectl apply configmapcommand.kubectl apply -f corednsms.yamlPerform a rolling restart to reload the ConfigMap and enable the Kubernetes Scheduler to restart CoreDNS without downtime using the
kubectl rollout restartcommand.kubectl --namespace kube-system rollout restart deployment coredns
Troubleshoot CoreDNS autoscaling issues
To troubleshoot CoreDNS autoscaling issues, see Autoscaling CoreDNS in Azure Kubernetes Service (AKS).