更换 Azure Stack Hub 缩放单元节点上的硬件组件Replace a hardware component on an Azure Stack Hub scale unit node

本文介绍更换非热插拔硬件组件的一般过程。This article describes the general process to replace hardware components that are non hot-swappable. 实际的更换步骤将因原始设备制造商 (OEM) 硬件供应商而异。Actual replacement steps vary based on your original equipment manufacturer (OEM) hardware vendor. 有关 Azure Stack Hub 集成系统特有的详细步骤,请参阅供应商的现场可更换部件 (FRU) 文档。See your vendor's field replaceable unit (FRU) documentation for detailed steps that are specific to your Azure Stack Hub integrated system.

注意

固件分级对于本文中所述的操作的成功至关重要。Firmware leveling is critical for the success of the operation described in this article. 缺少此步骤可能会导致系统不稳定、性能降低、安全威胁或阻止 Azure Stack Hub 自动化部署操作系统。Missing this step can lead to system instability, performance decrease, security threads, or prevent Azure Stack Hub automation from deploying the operating system. 更换硬件时,请始终参阅硬件合作伙伴的文档,以确保应用的固件与 Azure Stack Hub 管理员门户中显示的 OEM 版本匹配。Always consult your hardware partner's documentation when replacing hardware to ensure the applied firmware matches the OEM Version displayed in the Azure Stack Hub administrator portal.

硬件合作伙伴Hardware Partner 区域Region 代码URL
CiscoCisco AllAll 适用于 Azure Stack Hub 的 Cisco 集成系统操作指南Cisco Integrated System for Azure Stack Hub Operations Guide

适用于 Azure Stack Hub 的 Cisco 集成系统的发行说明Release Notes for Cisco Integrated System for Azure Stack Hub
Dell EMCDell EMC AllAll Cloud for Azure Stack Hub 14G(需要帐户和登录)Cloud for Azure Stack Hub 14G (account and sign-in required)

Cloud for Azure Stack Hub 13G(需要帐户和登录)Cloud for Azure Stack Hub 13G (account and sign-in required)
HPEHPE AllAll HPE ProLiant for Azure Stack HubHPE ProLiant for Azure Stack Hub
LenovoLenovo AllAll ThinkAgile SXM 最佳食谱ThinkAgile SXM Best Recipes
WortmannWortmann OEM/固件包OEM/firmware package
terra Azure Stack Hub 文档(包括 FRU)terra Azure Stack Hub documentation (including FRU)

非热插拔组件包括以下项:Non hot-swappable components include the following items:

  • CPU*CPU*
  • 内存*Memory*
  • 母板/基板管理控制器 (BMC)/视频卡Motherboard/baseboard management controller (BMC)/video card
  • 磁盘控制器/主机总线适配器 (HBA)/底板Disk controller/host bus adapter (HBA)/backplane
  • 网络适配器 (NIC)Network adapter (NIC)
  • 操作系统磁盘*Operating system disk*
  • 数据驱动器(不支持热插拔的驱动器,例如 PCI-e 外接卡)*Data drives (drives that don't support hot swap, for example PCI-e add-in cards)*

*这些组件可能支持热插拔,但因供应商实施情况而有所不同。*These components may support hot swap, but can vary based on vendor implementation. 有关详细步骤,请参阅 OEM 供应商的 FRU 文档。See your OEM vendor's FRU documentation for detailed steps.

以下流程图显示更换非热插拔硬件组件的一般 FRU 过程。The following flow diagram shows the general FRU process to replace a non hot-swappable hardware component.

显示组件更换流程的流程图

  • 根据硬件的物理条件,可能不需要此操作。This action may not be required based on the physical condition of the hardware.

** OEM 硬件供应商是否进行组件更换和固件更新可能会因支持合同而异。** Whether your OEM hardware vendor does the component replacement and updates the firmware could vary based on your support contract.

查看警报信息Review alert information

Azure Stack Hub 运行状况和监视系统会跟踪存储空间直通所控制的网络适配器和数据驱动器的运行状况。The Azure Stack Hub health and monitoring system tracks the health of network adapters and data drives controlled by Storage Spaces Direct. 它不会跟踪其他硬件组件。It doesn't track other hardware components. 针对所有其他硬件组件,在硬件生命周期主机上运行的供应商特定硬件监视解决方案中引发警报。For all other hardware components, alerts are raised in the vendor-specific hardware monitoring solution that runs on the hardware lifecycle host.

组件更换过程Component replacement process

以下步骤提供组件更换过程的高级概述。The following steps provide a high-level overview of the component replacement process. 请勿在未参考 OEM 提供的 FRU 文档的情况下按照这些步骤操作。Don't follow these steps without referring to your OEM-provided FRU documentation.

  1. 使用关闭操作正常关闭缩放单元节点。Use the Shutdown action to gracefully shut down the scale unit node. 根据硬件的物理条件,可能不需要此操作。This action may not be required based on the physical condition of the hardware.

  2. 万一关闭操作失败,请使用清空操作使缩放单元节点进入维护模式。In an unlikely case the shutdown action does fail, use the Drain action to put the scale unit node into maintenance mode. 根据硬件的物理条件,可能不需要此操作。This action may not be required based on the physical condition of the hardware.

    备注

    在任何情况下,只能同时禁用一个节点并关机,而不中断 S2D(存储空间直通)。In any case, only one node can be disabled and powered off at the same time without breaking the S2D (Storage Spaces Direct).

  3. 缩放单元节点处于维护模式后,请使用关闭电源操作。After the scale unit node is in maintenance mode, use the Power off action. 根据硬件的物理条件,可能不需要此操作。This action may not be required based on the physical condition of the hardware.

    备注

    在关闭电源操作不起作用的罕见情况下,请改用基板管理控制器 (BMC) Web 界面。In the unlikely case that the power off action doesn't work, use the baseboard management controller (BMC) web interface instead.

  4. 更换损坏的硬件组件。Replace the damaged hardware component. OEM 硬件供应商是否进行组件更换可能会因支持合同而异。Whether your OEM hardware vendor does the component replacement could vary based on your support contract.

  5. 更新固件。Update the firmware. 请使用硬件生命周期主机按照供应商特定的固件更新过程进行操作,以确保替换的硬件组件已应用批准的固件级别。Follow your vendor-specific firmware update process using the hardware lifecycle host to make sure the replaced hardware component has the approved firmware level applied. OEM 硬件供应商是否执行此步骤可能会因支持合同而异。Whether your OEM hardware vendor does this step could vary based on your support contract.

  6. 使用修复操作将缩放单元节点恢复到缩放单元。Use the Repair action to bring the scale unit node back into the scale unit.

  7. 使用到特权终结点检查虚拟磁盘修复状态Use the privileged endpoint to check the status of virtual disk repair. 利用新的数据驱动器,完整的存储修复作业可能需要数小时的时间,具体取决于系统负载和已使用的空间。With new data drives, a full storage repair job can take multiple hours depending on system load and consumed space.

  8. 修复操作完成后,验证是否已自动关闭所有活动警报。After the repair action has finished, validate that all active alerts have been automatically closed.

后续步骤Next steps