排查 Azure Stack Hub 中的问题Troubleshoot issues in Azure Stack Hub

本文档提供 Azure Stack Hub 集成环境的故障排除信息。This document provides troubleshooting information for Azure Stack Hub integrated environments. 有关 Azure Stack 开发工具包的帮助,请参阅 ASDK 故障排除或获取 Azure Stack Hub MSDN 论坛上的专家帮助。For help with the Azure Stack Development Kit, see ASDK Troubleshooting or get help from experts on the Azure Stack Hub MSDN Forum.

常见问题Frequently asked questions

这些部分包含有关发送到 Azure 支持的常见问题的文档链接。These sections include links to docs that cover common questions sent to Azure Support.

购买注意事项Purchase considerations

更新和诊断Updates and diagnostics

来宾 VM 支持的操作系统和大小Supported operating systems and sizes for guest VMs

Azure 市场Azure Marketplace

管理容量Manage capacity


若要增加 Azure Stack Hub 的总可用内存容量,可以添加更多内存。To increase the total available memory capacity for Azure Stack Hub, you can add additional memory. 在 Azure Stack Hub 中,物理服务器也称为“缩放单元节点”。In Azure Stack Hub, your physical server is also referred to as a scale unit node. 属于单个缩放单元的所有缩放单元节点必须具有相同的内存量All scale unit nodes that are members of a single scale unit must have the same amount of memory.

保留期Retention period

云操作员可以使用保留期设置来指定时间间隔天数(0 到 9999 天),在此期间,任何已删除的帐户都有可能能够恢复。The retention period setting lets a cloud operator to specify a time period in days (between 0 and 9999 days) during which any deleted account can potentially be recovered. 默认保留期设置为 0 天。The default retention period is set to 0 days. 将值设置为 0 表示任何已删除的帐户会立即超出保留期,并标记为定期进行垃圾回收。Setting the value to 0 means that any deleted account is immediately out of retention and marked for periodic garbage collection.

安全性、合规性和标识Security, compliance, and identity

管理 RBACManage RBAC

Azure Stack Hub 中的用户可以是订阅、资源组或服务的每个实例的读者、所有者或参与者。A user in Azure Stack Hub can be a reader, owner, or contributor for each instance of a subscription, resource group, or service.

如果 Azure 资源的内置角色不能满足组织的特定需求,则你可以创建自己的自定义角色。If the built-in roles for Azure resources don't meet the specific needs of your organization, you can create your own custom roles. 对于本教程,你将使用 Azure PowerShell 创建名为 Reader Support Tickets 的自定义角色。For this tutorial, you create a custom role named Reader Support Tickets using Azure PowerShell.

以 CSP 身份管理使用情况和计费Manage usage and billing as a CSP

选择用于 Azure Stack Hub 的共享服务帐户的类型。Choose the type of shared services account that you use for Azure Stack Hub. 可以用来注册多租户 Azure Stack Hub 的订阅类型为:The types of subscriptions that can be used for registration of a multi-tenant Azure Stack Hub are:

  • 云解决方案提供商Cloud Solution Provider
  • 合作伙伴共享服务订阅Partner Shared Services subscription

获取缩放单元指标Get scale unit metrics

可以使用 PowerShell 获取戳记使用情况信息,不需 Azure 支持提供帮助。You can use PowerShell to get stamp utilization information without help from Azure Support. 若要获取戳记使用率,请执行以下操作:To obtain stamp utilization:

  1. 创建 PEP 会话。Create a PEP session.
  2. 运行 test-azurestackRun test-azurestack.
  3. 退出 PEP 会话。Exit PEP session.
  4. 使用 invoke-command 调用运行 get-azurestacklog -filterbyrole seedringRun get-azurestacklog -filterbyrole seedring using an invoke-command call.
  5. 提取 seedring .zip。Extract the seedring .zip. 可以从运行 test-azurestack 的 ERCS 文件夹获取验证报告。You can obtain the validation report from the ERCS folder where you ran test-azurestack.

有关详细信息,请参阅 Azure Stack Hub 诊断For more information, see Azure Stack Hub Diagnostics.

排查虚拟机 (VM) 的问题Troubleshoot virtual machines (VMs)

重置 Linux VM 密码Reset Linux VM password

如果你忘记了 Linux VM 的密码,并且因为 VMAccess 扩展出现问题,“重置密码”选项不起作用,你可执行以下步骤进行重置:If you forget the password for a Linux VM and the Reset password option is not working due to issues with the VMAccess extension, you can perform a reset following these steps:

  1. 选择要用作恢复 VM 的 Linux VM。Choose a Linux VM to use as a recovery VM.

  2. 登录到用户门户:Sign in to the User portal:

    1. 记下 VM 大小、NIC、公共 IP、NSG 和数据磁盘。Make a note of the VM size, NIC, Public IP, NSG and data disks.
    2. 停止受影响的 VM。Stop the impacted VM.
    3. 删除受影响的 VM。Remove the impacted VM.
    4. 将受影响的 VM 中的磁盘作为数据磁盘附加到恢复 VM 上(可能需要花费几分钟时间才能使用该磁盘)。Attach the disk from the impacted VM as a data disk on the recovery VM (it may take a couple of minutes for the disk to be available).
  3. 登录到恢复 VM,并运行以下命令:Sign in to the recovery VM and run the following command:

    sudo su -
    mkdir /tempmount
    fdisk -l
    mount /dev/sdc2 /tempmount /*adjust /dev/sdc2 as necessary*/
    chroot /tempmount/
    passwd root /*substitute root with the user whose password you want to reset*/
    rm -f /.autorelabel /*Remove the .autorelabel file to prevent a time consuming SELinux relabel of the disk*/
    exit /*to exit the chroot environment*/
    umount /tempmount
  4. 登录到用户门户:Sign in to the User portal:

    1. 从恢复 VM 拆离该磁盘。Detach the disk from the Recovery VM.
    2. 从磁盘重新创建 VM。Recreate the VM from the disk.
    3. 请务必从前一个 VM 传输公共 IP、附加数据磁盘等。Be sure to transfer the Public IP from the previous VM, attach the data disks, etc.

还可以拍摄原始磁盘的快照并从中创建新磁盘,而不是直接在原始磁盘上执行更改。You may also take a snapshot of the original disk and create a new disk from it rather than perform the changes directly on the original disk. 有关详细信息,请参阅以下主题:For more information, see these topics:

预配期间,Windows Server 2012 R2 的许可证激活失败License activation fails for Windows Server 2012 R2 during provisioning

在这种情况下,Windows 将无法激活,此时屏幕右下角将显示一个水印。In this case, Windows will fail to activate and you will see a watermark on the bottom-right corner of the screen. 位于 C:\Windows\Panther 下的 WaSetup.xml 日志包含以下事件:The WaSetup.xml logs located under C:\Windows\Panther contains the following event:

<Event time="2019-05-16T21:32:58.660Z" category="ERROR" source="Unattend">
        <Message>InstrumentProcedure: Failed to execute 'Call ConfigureLicensing()'. Will raise error to caller</Message>
        <Description>Could not find the VOLUME_KMSCLIENT product</Description>

若要激活许可证,请复制要激活的 SKU 的自动虚拟机激活 (AVMA) 密钥。To activate the license, copy the Automatic Virtual Machine Activation (AVMA) key for the SKU you want to activate.

版本Edition AVMA 密钥AVMA Key
EssentialsEssentials K2XGM-NMBT3-2R6Q8-WF2FK-P36R2K2XGM-NMBT3-2R6Q8-WF2FK-P36R2

在 VM 上运行以下命令:On the VM, run the following command:

slmgr /ipk <AVMA_key>

若要获取完整的详细信息,请参阅 VM 激活For complete details, see VM Activation.

在 Azure Stack Hub 中部署 VM 之前,必须先添加 Windows Server 映像和库项。A Windows Server image and gallery item must be added before deploying VMs in Azure Stack Hub.

我已删除某些 VM,但仍在磁盘上看到 VHD 文件I've deleted some VMs, but still see the VHD files on disk

此行为是设计使然:This behavior is by design:

  • 删除 VM 时,不会删除 VHD。When you delete a VM, VHDs aren't deleted. 磁盘是资源组中的独立资源。Disks are separate resources in the resource group.
  • 删除某个存储帐户后,Azure 资源管理器会立即反映删除结果。When a storage account gets deleted, the deletion is visible immediately through Azure Resource Manager. 但是,该存储帐户包含的磁盘仍保留在存储中,直到运行垃圾收集为止。But the disks it may contain are still kept in storage until garbage collection runs.

如果看到“孤立的”VHD,必须知道它们是否包含在已删除的存储帐户的文件夹中。If you see "orphan" VHDs, it's important to know if they're part of the folder for a storage account that was deleted. 如果未删除存储帐户,则正常情况下会保留这些 VHD。If the storage account wasn't deleted, it's normal that they're still there.

可以在管理存储帐户中详细了解如何配置保留阈值和按需回收。You can read more about configuring the retention threshold and on-demand reclamation in manage storage accounts.

排查存储问题Troubleshoot storage

存储回收Storage reclamation

回收的容量最长可能需要在 14 小时后才显示在门户中。It may take up to 14 hours for reclaimed capacity to show up in the portal. 空间回收取决于不同的因素,包括块 Blob 存储中内部容器文件的用量百分比。Space reclamation depends on different factors including usage percentage of internal container files in block blob store. 因此,我们无法保证运行垃圾回收器时可回收的空间量,这取决于删除的数据量。Therefore, depending on how much data is deleted, there's no guarantee on the amount of space that could be reclaimed when garbage collector runs.

Azure 存储资源管理器不兼容 Azure Stack HubAzure Storage Explorer not working with Azure Stack Hub

如果在离线场景中使用集成系统,建议使用企业证书颁发机构 (CA)。If you're using an integrated system in a disconnected scenario, it's recommended to use an Enterprise Certificate Authority (CA). 以 Base-64 格式导出根证书,然后将其导入 Azure 存储资源管理器。Export the root certificate in a Base-64 format and then import it in Azure Storage Explorer. 确保从资源管理器终结点中删除尾部斜杠 (/)。Make sure that you remove the trailing slash (/) from the Resource Manager endpoint. 有关详细信息,请参阅准备连接到 Azure Stack HubFor more information, see Prepare for connecting to Azure Stack Hub.

对应用服务进行故障排除Troubleshoot App Service

Create-AADIdentityApp.ps1 脚本失败Create-AADIdentityApp.ps1 script fails

如果应用服务所需的 Create-AADIdentityApp.ps1 脚本失败,请确保在运行该脚本时包含必需的 -AzureStackAdminCredential 参数。If the Create-AADIdentityApp.ps1 script that's required for App Service fails, be sure to include the required -AzureStackAdminCredential parameter when running the script. 有关详细信息,请参阅在 Azure Stack Hub 上部署应用服务的先决条件For more information, see Prerequisites for deploying App Service on Azure Stack Hub.

对 Azure Stack Hub 更新进行故障排除Troubleshoot Azure Stack Hub updates

Azure Stack Hub 修补程序和更新过程旨在让操作员以一致且简单的方式应用更新包。The Azure Stack Hub patch and update process is designed to allow operators to apply update packages in a consistent, streamlined way. 虽然不常见,但在修补和更新过程中可能会出现问题。While uncommon, issues can occur during patch and update process. 如果在修补和更新过程中遇到问题,建议执行以下步骤:The following steps are recommended should you encounter an issue during the patch and update process:

  1. 先决条件:请确保已遵循 更新活动清单,并 启用主动日志收集Prerequisites: Be sure that you have followed the Update Activity Checklist and enable proactive log collection.

  2. 按照在更新失败时创建的失败警报中的补救步骤进行操作。Follow the remediation steps in the failure alert created when your update failed.

  3. 如果无法解决问题,请创建 Azure Stack Hub 支持票证If you have been unable to resolve your issue, create an Azure Stack Hub support ticket. 请确保已针对发生问题的时间跨度收集日志Be sure you have logs collected for the time span when the issue occurred. 如果更新失败(无论是出现关键警报还是出现警告),请务必检查故障并按照警报中的指示联系 Azure 客户支持服务,从而使缩放单元不会长时间处于失败状态。If an update fails, either with a critical alert or a warning, it's important that you review the failure and contact Azure Customer Support Services as directed in the alert so that your scale unit does not stay in a failed state for a long time. 使缩放单元长时间处于失败的更新状态可能会导致以后更难解决的其他问题。Leaving a scale unit in a failed update state for an extended period of time can cause additional issues that are more difficult to resolve later.

常见 Azure Stack Hub 修补程序和更新问题Common Azure Stack Hub patch and update issues

适用于:Azure Stack Hub 集成系统Applies to: Azure Stack Hub integrated systems


适用于:此问题适用于所有支持的版本。Applicable: This issue applies to all supported releases.

原因: 尝试安装 Azure Stack Hub 更新时,更新的状态可能会失败并将状态更改为 PreparationFailedCause: When attempting to install the Azure Stack Hub update, the status for the update might fail and change state to PreparationFailed. 对于连接到 Internet 的系统,这通常表明由于 Internet 连接不稳定,无法正确下载更新包。For internet-connected systems this is usually indicative of the update package being unable to download properly due to a weak internet connection.

补救措施:可以通过再次单击“立即安装”来解决此问题。Remediation: You can work around this issue by clicking Install now again. 如果此问题仍然存在,建议按照安装更新部分的说明手动上传更新包。If the problem persists, we recommend manually uploading the update package by following the Install updates section.

发生率:通用Occurrence: Common

更新失败:请在 CSV 上检查并强制实施外部密钥保护程序Update failed: Check and Enforce external key protectors on CSVs

适用于:此问题适用于所有支持的版本。Applicable: This issue applies to all supported releases.

原因:基板管理控制器 (BMC) 密码设置不正确。Cause: The baseboard management controller (BMC) password is not set correctly.

修正措施更新 BMC 凭据然后继续更新。Remediation: Update the BMC credential and resume the update.

更新过程中报告的警告和错误Warnings and errors reported while update is in progress

适用于:此问题适用于所有支持的版本。Applicable: This issue applies to all supported releases.

原因: 当 Azure Stack Hub 更新处于“正在进行”状态时,可能会在门户中报告警告和错误。Cause: When Azure Stack Hub update is in status In progress, warnings and errors may be reported in the portal. 组件在升级期间等待其他组件时可能会超时,从而导致错误。Components may timeout waiting for other components during upgrade resulting in an error. Azure Stack Hub 有一种机制,可以重试或修正由于间歇性错误导致的一些任务。Azure Stack Hub has mechanism to retry or remediate some of the tasks due to intermittent errors.

补救措施:当 Azure Stack Hub 更新处于“正在进行”状态时,可能会忽略门户中报告的警告和错误。Remediation: While the Azure Stack Hub update is in status In progress, warnings and errors reported in the portal can be ignored.

发生率:通用Occurrence: Common

2002 更新失败2002 update failed

适用于:此问题仅适用于 2002 版本。Applicable: This issue applies only to the 2002 release.

原因: 尝试安装 2002 更新时,更新可能会失败并提供以下消息:The private network parameter is missing from cloud parameters. Please use set-azsprivatenetwork cmdlet to set private networkTraceCause: When attempting the 2002 update, the update might fail and provide this message: The private network parameter is missing from cloud parameters. Please use set-azsprivatenetwork cmdlet to set private networkTrace.

补救措施设置专用内部网络Remediation: Set up a private internal network.