排查 Azure Stack Hub 上的 AKS 引擎问题Troubleshoot the AKS engine on Azure Stack Hub

在 Azure Stack Hub 上部署或使用 AKS 引擎时,可能会遇到问题。You may encounter an issue when deploying or working with the AKS engine on Azure Stack Hub. 本文介绍了对 AKS 引擎的部署进行故障排除、收集 AKS 引擎的信息、收集 Kubernetes 日志、查看自定义脚本扩展错误代码的步骤以及有关为 AKS 引擎提交 GitHub 问题的说明。This article looks at the steps to troubleshoot your deployment of the AKS engine, collect information about your AKS engine, collect Kubernetes logs, review custom script extension error codes, and instructions on opening a GitHub issue for the AKS engine.

排查 AKS 引擎安装问题Troubleshoot the AKS engine install

试用 GoFishTry GoFish

如果之前的安装步骤失败,可以使用 GoFish 包管理器安装 AKS 引擎。If your previous installation steps failed, you can install the AKS engine using the GoFish package manager. GoFish 将自己定义为跨平台的 Homebrew。GoFish describes itself as a cross-platform Homebrew.

在 Linux 上使用 GoFish 安装 AKS 引擎Install the AKS engine with GoFish on Linux

通过安装页安装 GoFish。Install GoFish from the Install page.

  1. 在 bash 提示符下,运行以下命令:From a bash prompt, run the following command:

    curl -fsSL https://raw.githubusercontent.com/fishworks/gofish/master/scripts/install.sh | bash
  2. 运行以下命令以使用 GoFish 安装 AKS 引擎:Run the following command to install the AKS engine with GoFish:

    Run "gofish install aks-engine"

在 Windows 上使用 GoFish 安装 AKS 引擎Install the AKS engine with GoFish on Windows

通过安装页安装 GoFish。Install GoFish from the Install page.

  1. 在权限提升的 PowerShell 提示符下,运行以下命令:From an elevated PowerShell prompt, run the following command:

    Set-ExecutionPolicy Bypass -Scope Process -Force
    iex ((New-Object System.Net.WebClient).DownloadString('https://raw.githubusercontent.com/fishworks/gofish/master/scripts/install.ps1'))
  2. 在同一会话中运行以下命令以使用 GoFish 安装 AKS 引擎:Run the following command in the same session to install the AKS engine with GoFish:

    gofish install aks-engine

常见部署问题的清单Checklist for common deployment issues

当使用 AKS 引擎部署 Kubernetes 群集时,如遇错误,可以检查:When encountering errors while deploying a Kubernetes cluster using the AKS engine, you can check:

  1. 是否使用了正确的服务主体凭据 (SPN)?Are you using the correct Service Principal credentials (SPN)?
  2. SPN 是否具有对 Azure Stack Hub 订阅的“参与者”角色?Does the SPN have a "Contributors" role to the Azure Stack Hub subscription?
  3. Azure Stack Hub 计划中是否有足够大的配额?Do you have a large enough quota in your Azure Stack Hub plan?
  4. Azure Stack Hub 实例是否正在应用修补程序或进行升级?Is the Azure Stack Hub instance having a patch or upgrade being applied?

有关详细信息,请参阅 Azure/aks-engine GitHub 存储库中的故障排除一文。For more information, see the Troubleshooting article in the Azure/aks-engine GitHub repo.

收集 AKS 引擎日志Collect AKS engine logs

可以查看 AKS 引擎创建的信息。You can review information created by the AKS engine. 应用程序运行时,AKS 引擎会报告状态和错误。The AKS engine reports status and errors as the application runs. 可以通过管道将输出传送到文本文件,也可以直接从命令行控制台复制输出。You can either pipe the output to a text file or copy it directly from the command-line console. 请参阅查看自定义脚本扩展错误代码,查看 AKS 引擎触发的错误代码的列表。Refer to a list of error codes triggered by the AKS engine at Review custom script extension error codes.

  1. 从 AKS 引擎命令行工具中显示的信息中收集标准输出和错误信息。Gather standard output and error from information displayed in the AKS engine command-line tool.

  2. 从本地文件获取日志。Get logs from a local file. 可以通过设置“--output-directory”标志,使用 get-logs 命令来设置输出目录。You can set the output directory with the get-logs command by setting --output-directory flag.

    设置日志的本地路径:To set the local path for the logs:

    aks-engine get-logs --output-directory <path to the directory>

收集 Kubernetes 日志Collect Kubernetes logs

除了 AKS 引擎日志之外,Kubernetes 组件还会生成状态和错误消息。In addition, to the AKS engine logs, the Kubernetes components generate status and error messages. 可使用 Bash 脚本 getkuberneteslogs.sh 收集这些日志。You can collect these logs using the Bash script, getkuberneteslogs.sh.

此脚本可实现自动收集以下日志:This script automates the process of gathering the following logs:

  • /var/log/azure/ 目录中的日志文件Log files in directory /var/log/azure/
  • /var/log/kubeaudit 目录中的日志文件(kube 审核日志)Log files in directory /var/log/kubeaudit (kube audit logs)
  • 日志文件 /var/log/waagent.log (waagent)Log file /var/log/waagent.log (waagent)
  • 日志文件 /var/log/azure/deploy-script-dvm.log(如果使用 Azure Stack Hub 的 Kubernetes 群集市场项进行部署)Log file /var/log/azure/deploy-script-dvm.log (if deployed using Azure Stack Hub's Kubernetes Cluster marketplace item)
  • /etc/kubernetes/manifests 目录中的静态清单Static manifests in directory /etc/kubernetes/manifests
  • /etc/kubernetes/addons 目录中的静态加载项Static addons in directory /etc/kubernetes/addons
  • kube 系统容器元数据和日志kube-system containers metadata and logs
  • Kubelet 状态和日志kubelet status and journal
  • etcd 状态和日志etcd status and journal
  • Docker 状态和日志Docker status and journal
  • kube 系统快照kube-system snapshot
  • Azure CNI 配置文件Azure CNI config files

为 Windows 节点检索了一些其他日志:Some additional logs are retrieved for Windows nodes:

  • 日志文件 c:\Azure\CustomDataSetupScript.logLog file c:\Azure\CustomDataSetupScript.log
  • kube 代理状态和日志kube-proxy status and journal
  • containerd 状态和日志containerd status and journal
  • azure-vnet log and azure-vnet-telemetry logazure-vnet log and azure-vnet-telemetry log
  • Docker 的 ETW 事件ETW events for docker
  • Hyper-V 的 ETW 事件ETW events for Hyper-V

如果没有此脚本,需要连接到群集中的每个节点,手动查找并下载日志。Without this script, you would need to connect to each node in the cluster locate and download the logs manually. 此外,该脚本还可以(可选)将收集的日志上传到存储帐户,通过该帐户与其他人共享日志。In addition, the script can, optionally, upload the collected logs to a storage account that you can use to share the logs with others.


  • Windows 上的 Linux VM、Git Bash 或 Bash。A Linux VM, Git Bash or Bash on Windows.
  • 在将运行脚本的计算机中安装了 Azure CLIAzure CLI installed on the machine where the script will be run.
  • 服务主体标识已登录到 Azure Stack Hub 的 Azure CLI 会话。Service principal identity signed into an Azure CLI session to Azure Stack Hub. 由于脚本能够发现并创建 Azure Stack 资源管理器资源来完成其工作,因此需要 Azure CLI 和服务主体标识。Since the script has the capability of discovering and creating Azure Stack Resource Manager resources to do its work, it requires the Azure CLI and a service principal identity.
  • 用户帐户(订阅),且已在环境中选择了 Kubernetes 群集。User account (subscription) where the Kubernetes cluster is already selected in the environment.
  1. 将最新版本的脚本 tar 文件下载到客户端 VM,该 VM 应是有权访问 Kubernetes 群集的计算机,或是使用 AKS 引擎部署群集时使用的同一计算机。Download the latest release of the script tar file into your client VM, a machine that has access to your Kubernetes cluster or the same machine you used to deploy your cluster with the AKS engine.

    运行以下命令:Run the following commands:

    mkdir -p $HOME/kuberneteslogs
    cd $HOME/kuberneteslogs
    wget https://github.com/msazurestackworkloads/azurestack-gallery/releases/download/diagnosis-v0.1.5/diagnosis-v0.1.5.tar.gz
    tar xvf diagnosis-v0.1.5.tar.gz -C ./
  2. 查找 getkuberneteslogs.sh 脚本所需的参数。Look for the parameters required by the getkuberneteslogs.sh script. 此脚本将使用以下参数:The script will use the following parameters:

    参数Parameter 说明Description 必需Required 示例Example
    -h、--help-h, --help 打印命令用法。Print command usage. no
    -u、--user-u,--user 群集 VM 的管理员用户名The administrator username for the cluster VMs yes azureuserazureuser
    (默认值)(default value)
    -i、--identity-file-i, --identity-file 与用于创建 Kubernetes 群集的公钥绑定的 RSA 私钥(有时称为“id_rsa”)RSA private key tied to the public key used to create the Kubernetes cluster (sometimes named 'id_rsa') yes ./rsa.pem (Putty)./rsa.pem (Putty)
    ~/.ssh/id_rsa (SSH)~/.ssh/id_rsa (SSH)
    -g, --resource-group-g, --resource-group Kubernetes 群集资源组Kubernetes cluster resource group yes k8sresourcegroupk8sresourcegroup
    -n、--user-namespace-n, --user-namespace 从指定的命名空间中的容器中收集日志(始终收集 kube-system 日志)Collect logs from containers in the specified namespaces (kube-system logs are always collected) no monitoringmonitoring
    --api-model--api-model 将 apimodel.json 文件保存在 Azure Stack Hub 存储帐户中。Persists apimodel.json file in an Azure Stack Hub Storage account. 如果也提供了“--upload-logs”参数,则会将 apimodel.json 文件上传到存储帐户。Upload apimodel.json file to storage account happens when --upload-logs parameter is also provided. no ./apimodel.json
    --all-namespaces--all-namespaces 从所有命名空间中的容器中收集日志。Collect logs from containers in all namespaces. It overrides --user-namespaceIt overrides --user-namespace no
    --upload-logs--upload-logs 将检索的日志保存在 Azure Stack Hub 存储帐户中。Persists retrieved logs in an Azure Stack Hub storage account. 可在 KubernetesLogs 资源组中找到日志Logs can be found in KubernetesLogs resource group no
    --disable-host-key-checking--disable-host-key-checking 在执行脚本时将 SSH 的 StrictHostKeyChecking 选项设置为“否”。Sets SSH's StrictHostKeyChecking option to "no" while the script executes. 仅在安全环境中使用。Only use in a safe environment. no
  3. 使用你的信息运行以下任何示例命令:Run any of the following example commands with your information:

    ./getkuberneteslogs.sh -u azureuser -i private.key.1.pem -g k8s-rg
    ./getkuberneteslogs.sh -u azureuser -i ~/.ssh/id_rsa -g k8s-rg --disable-host-key-checking
    ./getkuberneteslogs.sh -u azureuser -i ~/.ssh/id_rsa -g k8s-rg -n default -n monitoring
    ./getkuberneteslogs.sh -u azureuser -i ~/.ssh/id_rsa -g k8s-rg --upload-logs --api-model clusterDefinition.json
    ./getkuberneteslogs.sh -u azureuser -i ~/.ssh/id_rsa -g k8s-rg --upload-logs

查看自定义脚本扩展错误代码Review custom script extension error codes

在运行群集时,可以查看由自定义脚本扩展 (CSE) 创建的错误代码列表。You can consult a list of error codes created by the custom script extension (CSE) in running your cluster. CSE 错误可帮助诊断问题的根本原因。The CSE error can be useful in diagnosing the root cause of the problem. Kubernetes 群集中使用的 Ubuntu 服务器的 CSE 支持许多 AKS 引擎操作。The CSE for the Ubuntu server used in your Kubernetes cluster supports many of the AKS engine operations. 有关 CSE 退出代码的详细信息,请参阅 cse_helpers.shFor more information about the CSE exit codes, see cse_helpers.sh.

向 Azure 支持工程师提供 Kubernetes 日志Providing Kubernetes logs to a Azure support engineer

如果在收集和检查日志后仍无法解决问题,则可能需要启动创建支持票证的过程,并提供通过使用 --upload-logs 参数集运行 getkuberneteslogs.sh 后所收集的日志。If after collecting and examining logs you still cannot resolve your issue, you may want to start the process of creating a support ticket and provide the logs that you collected by running getkuberneteslogs.sh with the --upload-logs parameter set.

请联系 Azure Stack Hub 操作员。Contact your Azure Stack Hub operator. 操作员使用日志中的信息创建支持案例。Your operator uses the information fro your logs to create the support case.

在解决支持问题的过程中,Azure 支持工程师可能会请求 Azure Stack Hub 操作员收集 Azure Stack Hub 系统日志。During the process of addressing any support issues, a Azure support engineer may request that your Azure Stack Hub operator collect the Azure Stack Hub system logs. 可能需要向操作员提供运行 getkuberneteslogs.sh 来上传 Kubernetes 日志时所用的存储帐户的信息。You may need to provide your operator with the storage account information where you uploaded the Kubernetes logs by running getkuberneteslogs.sh.

操作员可能运行 Get-AzureStackLog PowerShell cmdlet。Your operator may run the Get-AzureStackLog PowerShell cmdlet. 此命令使用参数 (-InputSaSUri) 来指定存储 Kubernetes 日志的存储帐户。This command uses a parameter (-InputSaSUri) that specifies the storage account where you stored the Kubernetes logs.

操作员可能将生成的日志与 Azure 支持所需的任何其他系统日志合并,并将其提供给 Azure。Your operator may combine the logs you produced along with whatever other system logs may be needed by Azure support and make them available to the Azure.

提交 GitHub 问题Open GitHub issues

如果无法解决部署错误,可以提交 GitHub 问题。If you are unable to resolve your deployment error, you can open a GitHub Issue.

  1. 在 AKS 引擎存储库中提交 GitHub 问题Open a GitHub Issue in the AKS engine repository.

  2. 使用以下格式添加标题:CSE error: exit code <INSERT_YOUR_EXIT_CODE>Add a title using the following format: CSE error: exit code <INSERT_YOUR_EXIT_CODE>.

  3. 在问题中包含以下信息:Include the following information in the issue:

    • 用于部署群集的群集配置文件 apimodel jsonThe cluster configuration file, apimodel json, used to deploy the cluster. 在将其发布到 GitHub 之前,请删除所有机密和密钥。Remove all secrets and keys before posting it on GitHub.
    • 以下 kubectl 命令 get nodes 的输出。The output of the following kubectl command get nodes.
    • /var/log/azure/cluster-provision.log/var/log/cloud-init-output.log 的内容The content of /var/log/azure/cluster-provision.log and /var/log/cloud-init-output.log

后续步骤Next steps