Azure Stack HCI 上的 Azure Kubernetes 服务故障排除Troubleshooting Azure Kubernetes Service on Azure Stack HCI

使用 Azure Stack HCI 上的 Azure Kubernetes 服务创建或管理 Kubernetes 群集时,可能偶尔会遇到问题。When you create or manage a Kubernetes cluster by using Azure Kubernetes Service on Azure Stack HCI, you might occasionally come across problems. 本文提供解决这些问题的故障排除指南。This article provides troubleshooting guidelines to help you resolve those problems.

Azure Stack HCI 故障排除Troubleshooting Azure Stack HCI

若要对 Azure Stack HCI 群集中所有服务器上的网络和存储 QoS(服务质量)设置的群集验证报表进行故障排除,并验证是否已定义重要规则,请参阅对群集验证报表进行故障排除To troubleshoot cluster validation reporting for network and storage QoS (quality of service) settings across servers in an Azure Stack HCI cluster and verify that important rules are defined, see Troubleshoot cluster validation reporting.

若要了解如何排查 CredSSP 的问题,请参阅对 CredSSP 进行故障排除To learn about troubleshooting problems with CredSSP, see Troubleshoot CredSSP.

排查 Windows Admin Center 问题Troubleshooting Windows Admin Center

此产品处于公共预览状态,这意味着它仍处于开发阶段。This product is in public preview, which means it's still in development. 目前,Windows Admin Center Azure Kubernetes 服务扩展存在一些问题:There are currently some issues with the Windows Admin Center Azure Kubernetes Service extension:

  • 目前,用来设置 Azure Stack HCI 上的 Azure Kubernetes 服务的系统群集中的每个服务器都必须是受信任的服务器。Currently, each server in the cluster of the system you're using to set up Azure Kubernetes Service on Azure Stack HCI must be a trusted server. 因此 Windows Admin Center 必须能够在群集中的每个(而不只是一个或数个)服务器上运行 CredSSP 操作。So Windows Admin Center must be able to run CredSSP operations on every server in the cluster, not just on one or a few of them.

  • 如果遇到“msft.sme.aks couldn't load”错误,并且该错误指出加载区块失败,请使用最新版 Microsoft Edge 或 Google Chrome 并重试。If you get an error that says msft.sme.aks couldn't load, and the error says that loading chunks failed, use the latest version of Microsoft Edge or Google Chrome and try again.

  • 在启动用于设置 Azure Kubernetes 服务主机的向导或用于创建 Kubernetes 群集的向导之前,应通过 Windows Admin Center 登录到 Azure。Before you start either the wizard for setting up Azure Kubernetes Service host or the wizard for creating a Kubernetes cluster, you should sign in to Azure through Windows Admin Center. 在工作流中,你可能需要重新登录。You might need to sign in again during the workflow. 如果在通过 Windows Admin Center 登录到 Azure 时遇到困难,请尝试从其他源(如 Azure 门户)登录到 Azure 帐户。If you're having difficulties signing in to Azure through Windows Admin Center, try signing in to your Azure account from another source, like the Azure portal. 如果仍然遇到问题,请在与支持部门联系之前,查看 Windows Admin Center 已知问题一文。If you continue to have problems, check the Windows Admin Center known issues article before you contact support.

  • 在通过 Windows Admin Center 进行的 Azure Stack HCI 上 Azure Kubernetes 服务部署的当前迭代中,只有设置 Azure Kubernetes 服务主机的用户才可以在系统上创建 Kubernetes 群集。In the current iteration of Azure Kubernetes Service on Azure Stack HCI deployment through Windows Admin Center, only the user who set up the Azure Kubernetes Service host can create Kubernetes clusters on the system. 若要解决此问题,请将 .wssd 文件夹从设置 Azure Kubernetes 服务主机的用户配置文件复制到将要创建新 Kubernetes 群集的用户配置文件。To work around this issue, copy the .wssd folder from the profile of the user who set up the Azure Kubernetes Service host to the profile of the user who will be creating the new Kubernetes cluster.

  • 如果在任一向导中收到一个有关错误配置的错误,请执行群集清理操作。If you receive an error in either wizard about a wrong configuration, perform cluster cleanup operations. 这些操作可能包含删除 C:\Program Files\AksHci\mocctl.exe 文件。These operations might involve removing the C:\Program Files\AksHci\mocctl.exe file.

  • 若要使 CredSSP 在群集创建向导中成功地发挥作用,Windows Admin Center 必须由同一帐户安装和使用。For CredSSP to function successfully in the cluster creation wizard, Windows Admin Center must be installed and used by the same account. 如果使用某个帐户安装 Windows Admin Center 之后,在使用它时使用的是其他帐户,会出现错误。If you install Windows Admin Center with one account and try to use it with another, you'll get errors.

  • 在群集部署过程中,helm.zip 文件传输可能会出现问题。During cluster deployment, you might encounter a problem with the helm.zip file transfer. 此问题通常会导致错误,该错误会指出 helm.zip 文件的路径不存在或无效。This problem often causes an error that says the path to the helm.zip file doesn't exist or isn't valid. 若要解决此问题,请重试部署。To resolve this problem, retry the deployment.

  • 如果部署在很长时间内处于挂起状态,则表明可能存在 CredSSP 问题或连接性问题。If your deployment hangs for an extended period, you might be having CredSSP or connectivity problems. 请尝试通过执行以下步骤来排查部署问题:Try these steps to troubleshoot your deployment:

    1. 在运行 Windows Admin Center 的计算机上的 PowerShell 窗口中运行以下命令:On the machine running Windows Admin Center, run the following command in a PowerShell window:

      Enter-PSSession <servername>
      
    2. 如果此命令成功,则表明你可以连接到服务器,没有连接性问题。If this command succeeds, you can connect to the server and there's no connectivity issue.

    3. 如果存在 CredSSP 问题,请运行此命令来测试网关计算机与目标计算机之间的信任:If you're having CredSSP problems, run this command to test the trust between the gateway machine and the target machine:

      Enter-PSSession -ComputerName <server> -Credential company\administrator -Authentication CredSSP
      

      还可以运行以下命令来测试在访问本地网关时的信任:You can also run the following command to test the trust in accessing the local gateway:

      Enter-PSSession -computer localhost -credential (Get-Credential)
      
  • 如果使用的是 Azure Arc 且有多个租户 ID,请在部署之前运行以下命令指定所需租户。If you're using Azure Arc and have multiple tenant IDs, run the following command to specify your desired tenant before deployment. 否则,部署可能会失败。If you don't, your deployment might fail.

    az login -tenant <tenant>
    
  • 如果刚刚创建了新的 Azure 帐户,且尚未在网关计算机上登录到该帐户,则可能会在将 Windows Admin Center 网关注册到 Azure 时遇到问题。If you've just created a new Azure account and haven't signed in to the account on your gateway machine, you might experience problems with registering your Windows Admin Center gateway with Azure. 若要缓解此问题,请在另一个浏览器标签页或窗口中登录到 Azure 帐户,然后将 Windows Admin Center 网关注册到 Azure。To mitigate this problem, sign in to your Azure account in another browser tab or window, and then register the Windows Admin Center gateway to Azure.

创建 Windows Admin Center 日志Creating Windows Admin Center logs

当你报告 Windows Admin Center 存在的问题时,最好附加日志以帮助开发团队诊断问题。When you report problems with Windows Admin Center, it's a good idea to attach logs to help the development team diagnose your problem. Windows Admin Center 中的错误通常以两种形式之一出现:Errors in Windows Admin Center generally come in one of two forms:

  • 运行 Windows Admin Center 的计算机上的事件查看器中显示的事件Events that appear in the event viewer on the machine running Windows Admin Center
  • 浏览器控制台中出现的 JavaScript 问题JavaScript problems that surface in the browser console

若要收集 Windows Admin Center 的日志,请使用公共预览版包中提供的 Get-SMEUILogs.ps1 脚本。To collect logs for Windows Admin Center, use the Get-SMEUILogs.ps1 script that's provided in the public preview package.

若要使用该脚本,请在存储脚本的文件夹中运行此命令:To use the script, run this command in the folder where your script is stored:

./Get-SMEUILogs.ps1 -ComputerNames [comp1, comp2, etc.] -Destination [comp3] -HoursAgo [48] -NoCredentialPrompt

该命令有以下参数:The command has the following parameters:

  • -ComputerNames:要从中收集日志的计算机的列表。-ComputerNames: A list of machines you want to collect logs from.
  • -Destination:要将日志聚合到的计算机。-Destination: The machine you want to aggregate the logs to.
  • -HoursAgo:收集日志的开始时间,以运行脚本之前的小时数表示。-HoursAgo: The start time for collecting logs, expressed in hours before the time you run the script.
  • -NoCredentialPrompt:一个开关,用于在当前环境中关闭凭据提示并使用默认凭据。-NoCredentialPrompt: A switch to turn off the credentials prompt and use the default credentials in your current environment.

如果在运行此脚本时遇到困难,可以运行以下命令来查看帮助文本:If you have difficulties running this script, you can run the following command to view the Help text:

GetHelp .\Get-SMEUILogs.ps1 -Examples

Windows 工作器节点故障排除Troubleshooting Windows worker nodes

若要登录到 Windows 工作器节点,请先通过运行 kubectl get 获取节点的 IP 地址。To sign in to a Windows worker node, first get the IP address of your node by running kubectl get. 请记下 EXTERNAL-IP 值。Note the EXTERNAL-IP value.

kubectl get nodes -o wide

使用 ssh Administrator@ip 通过 SSH 登录到节点。SSH into the node by using ssh Administrator@ip. 通过 SSH 登录到节点后,可以运行 net user administrator * 来更新管理员密码。After you SSH into the node, you can run net user administrator * to update your administrator password.

Linux 工作器节点故障排除Troubleshooting Linux worker nodes

若要登录到 Linux 工作器节点,请先通过运行 kubectl get 获取节点的 IP 地址。To sign in to a Linux worker node, first get the IP address of your node by running kubectl get. 请记下 EXTERNAL-IP 值。Note the EXTERNAL-IP value.

kubectl get nodes -o wide

使用 ssh clouduser@ip 通过 SSH 登录到节点。SSH into the node by using ssh clouduser@ip.

后续步骤Next steps

如果在使用 Azure Stack HCI 上的 Azure Kubernetes 服务时仍然遇到问题,可以通过 GitHub 提交 bug。If you continue to run into problems when you're using Azure Kubernetes Service on Azure Stack HCI, you can file bugs through GitHub.