Troubleshoot AKS engine on Azure Stack Hub
You may find an issue when deploying or working with AKS engine on Azure Stack Hub. This article looks at the steps to troubleshoot your deployment of AKS engine. Collect information about your AKS engine, collect Kubernetes logs, and review custom script extension error codes. You can also open a GitHub issue for AKS engine.
Note
For AKSe version 0.75.3 and above, the aks-engine
commands below will begin with aks-engine-azurestack
rather than aks-engine
.
Troubleshoot the AKS engine install
If your previous installation steps failed, you can install AKS engine using the GoFish package manager. GoFish describes itself as a cross-platform Homebrew.
You can find instructions for using GoFish to install the AKS engine here.
Collect node and cluster logs
You can find the instructions on collecting node and cluster logs at Retrieving Node and Cluster Logs.
Prerequisites
This guide assumes you've already downloaded the Azure CLI and the AKS engine.
This guide also assumes that you've deployed a cluster using AKS engine. For more information, see Deploy a Kubernetes cluster with AKS engine on Azure Stack Hub .
Retrieving logs
The aks-engine get-logs
command can be useful to troubleshoot issues with your cluster. The command produces, collects, and downloads a set of files to your workstation. The files include node configuration, cluster state and configuration, and set up log files.
At a high level: the command works by establishing an SSH session into each node, executing a log collection script that collects and zips relevant files, and downloading the .ZIP file to your local computer.
SSH authentication
You'll need a valid SSH private key to establish an SSH session to the cluster Linux nodes. Windows credentials are stored in the API model and will be loaded from there. Set windowsprofile.sshEnabled
to true to enable SSH in your Windows nodes.
Upload logs to a storage account container
Once the cluster logs were successfully retrieved, AKS Engine can save them on an Azure Storage Account container if optional parameter --upload-sas-url
is set. AKS Engine expects the container name to be part of the provided SAS URL. The expected format is https://{blob-service-uri}/{container-name}?{sas-token}
.
Note
Storage accounts on custom clouds using the AD FS identity provider are not yet supported.
Nodes unable to join the cluster
By default, aks-engine get-logs
collects logs from nodes that successfully joined the cluster. To collect logs from VMs that weren't able to join the cluster, set flag --vm-names
:
--vm-name k8s-pool-01,k8s-pool-02
Usage for aks-engine get-logs
Assuming that you have a cluster deployed and the API model originally used to deploy that cluster is stored at _output/<dnsPrefix>/apimodel.json
, then you can collect logs running a command like:
aks-engine get-logs \
--location <location> \
--api-model _output/<dnsPrefix>/apimodel.json \
--ssh-host <dnsPrefix>.<location>.cloudapp.chinacloudapi.cn \
--linux-ssh-private-key ~/.ssh/id_rsa
Parameters
Parameter | Required | Description |
---|---|---|
--location | Yes | Azure location of the cluster's resource group. |
--api-model | Yes | Path to the generated API model for the cluster. |
--ssh-host | Yes | FQDN, or IP address, of an SSH listener that can reach all nodes in the cluster. |
--linux-ssh-private-key | Yes | Path to an SSH private key that can be used to create a remote session on the cluster Linux nodes. |
--output-directory | No | Output directory, derived from --api-model if missing. |
--control-plane-only | No | Only collect logs from control plane nodes. |
--vm-names | No | Only collect logs from the specified VMs (comma-separated names). |
--upload-sas-url | No | Azure Storage Account SAS URL to upload the collected logs. |
Review custom script extension error codes
The AKS engine produces a script for each Ubuntu Server as a resource for the custom script extension (CSE) to perform deployment tasks. If the script throws an error, it will log an error in /var/log/azure/cluster-provision.log
. The errors are displayed in the portal. The error code may be helpful in figuring out the case of the problem. For more information about the CSE exit codes, see cse_helpers.sh
.
Providing Kubernetes logs to a Azure support engineer
If after collecting and examining logs you still can't resolve your issue, you may want to start the process of creating a support ticket and provide the logs that you collected.
Your operator may combine the logs you produced along with other system logs that may be needed by Azure support. The operator may make them available to the Azure.
You can provide Kubernetes logs in several ways:
- You can contact your Azure Stack Hub operator. Your operator uses the information from the logs stored in the .ZIP file to create the support case.
- If you have the SAS URL for a storage account where you can upload your Kubernetes logs, you can include the following command and flag with the SAS URL to save the logs to the storage account:
For instructions, see Upload logs to a storage account container.aks-engine get-logs -upload-sas-url <SAS-URL>
- If you're a cloud operator, you can:
- Use the Help + support blade in the Azure Stack Hub Administration portal to upload logs. For instructions, see Send logs now with the administrator portal.
- Use the Get-AzureStackLog PowerShell cmdlet using the Privileged End Point (PEP) For instruction, see Send logs now with PowerShell.
Open GitHub issues
If you're unable to resolve your deployment error, you can open a GitHub Issue.
Open a GitHub Issue in the AKS engine repository.
Add a title using the following format: CSE error:
exit code <INSERT_YOUR_EXIT_CODE>
.Include the following information in the issue:
The cluster configuration file,
apimodel.json
, used to deploy the cluster. Remove all secrets and keys before posting it on GitHub.The output of the following kubectl command
get nodes
.The content of
/var/log/azure/cluster-provision.log
from an unhealthy node.
Next steps
- Read about the AKS engine on Azure Stack Hub.