Azure Stack Hub 中的缩放单元节点操作Scale unit node actions in Azure Stack Hub

本文介绍如何查看缩放单元的状态。This article describes how to view the status of a scale unit. 可以查看单元的节点。You can view the unit's nodes. 可以运行开机、关机、关闭、清空、恢复和修复等节点操作。You can run node actions like power on, power off, shut down, drain, resume, and repair. 通常,在现场更换组件期间或者在帮助恢复节点时,会使用这些节点操作。Typically, you use these node actions during field replacement of parts, or to help recover a node.

重要

本文中所述的所有节点操作每次应该针对一个节点。All node actions described in this article should target one node at a time.

查看节点状态View the node status

在管理员门户中,可以查看缩放单元及其关联节点的状态。In the administrator portal, you can view the status of a scale unit and its associated nodes.

查看缩放单元的状态:To view the status of a scale unit:

  1. 在“区域管理”磁贴中选择区域。On the Region management tile, select the region.

  2. 在左侧的“基础结构资源”下,选择“缩放单元”。 On the left, under Infrastructure resources, select Scale units.

  3. 在结果中选择缩放单元。In the results, select the scale unit.

  4. 从左侧的“常规”下面,选择“节点”。 On the left, under General, select Nodes.

    查看以下信息:View the following information:

    • 各个节点的列表。The list of individual nodes.
    • 操作状态(请参见以下列表)。Operational Status (see list below).
    • 电源状态(“正在运行”或“已停止”)。Power Status (running or stopped).
    • 服务器模型。Server model.
    • 基板管理控制器 (BMC) 的 IP 地址。IP address of the baseboard management controller (BMC).
    • 核心总数。Total number of cores.
    • 总内存量。Total amount of memory.

缩放单元的状态

节点操作状态Node operational states

状态Status 说明Description
正在运行Running 节点都积极参与缩放单元。The node is actively participating in the scale unit.
已停止Stopped 节点不可用。The node is unavailable.
正在添加Adding 正在主动将节点添加到缩放单元。The node is actively being added to the scale unit.
正在修复Repairing 正在主动修复节点。The node is actively being repaired.
维护Maintenance 节点已暂停,没有处于运行状态的活动用户工作负荷。The node is paused, and no active user workload is running.
需要修正Requires Remediation 检测到错误,需要修复节点。An error has been detected that requires the node to be repaired.

Azure Stack Hub 在操作后显示“正在添加”状态Azure Stack Hub shows Adding status after an operation

Azure Stack Hub 在执行排出、恢复、修复、关闭或启动之类的操作后,可能会将操作节点状态显示为“正在添加”。Azure Stack Hub may show the operational node status as Adding after an operation like drain, resume, repair, shutdown or start was executed. 如果 Fabric 资源提供程序角色缓存在操作之后未刷新,可能会发生这种情况。This can happen when the Fabric Resource Provider Role cache did not refresh after an operation.

在应用以下步骤之前,请确保当前没有正在进行的操作。Before applying the following steps ensure that no operation is currently in progress. 更新终结点,使之与环境匹配。Update the endpoint to match your environment.

  1. 打开 PowerShell 并添加 Azure Stack Hub 环境。Open PowerShell and add your Azure Stack Hub environment. 这需要在计算机上安装 Azure Stack Hub PowerShellThis requires Azure Stack Hub PowerShell to be installed on your computer.

    Add-AzEnvironment -Name AzureStack -ARMEndpoint https://adminmanagement.local.azurestack.external
    Add-AzAccount -Environment AzureStack
    
  2. 运行以下命令以重启 Fabric 资源提供程序角色。Run the following command to restart the Fabric Resource Provider Role.

    Restart-AzsInfrastructureRole -Name FabricResourceProvider
    
  3. 验证受影响的缩放单元节点的操作状态是否已更改为“正在运行”。Validate the operational status of the impacted scale unit node changed to Running. 可以使用管理员门户或以下 PowerShell 命令:You can use the Administrator portal or the following PowerShell command:

    Get-AzsScaleUnitNode |ft name,scaleunitnodestatus,powerstate
    
  4. 如果节点操作状态仍显示为“正在添加”,则继续创建支持事件。If the node operational status is still shown as Adding continue to open a support incident.

缩放单元节点操作Scale unit node actions

查看缩放单元节点的相关信息时,也可以执行节点操作,例如:When you view information about a scale unit node, you can also perform node actions like:

  • 启动和停止(取决于当前电源状态)。Start and stop (depending on current power status).
  • 禁用和恢复(取决于操作状态)。Disable and resume (depending on operations status).
  • 修复。Repair.
  • 关闭。Shutdown.

节点的工作状态确定了哪些选项可用。The operational state of the node determines which options are available.

需要安装 Azure Stack Hub PowerShell 模块。You need to install Azure Stack Hub PowerShell modules. 这些 cmdlet 位于 Azs.Fabric.Admin 模块中。These cmdlets are in the Azs.Fabric.Admin module. 若要安装或验证适用于 Azure Stack Hub 的 PowerShell 的安装,请参阅安装适用于 Azure Stack Hub 的 PowerShellTo install or verify your installation of PowerShell for Azure Stack Hub, see Install PowerShell for Azure Stack Hub.

停止Stop

“停止”操作会关闭节点。The Stop action turns off the node. 它的作用如同按下电源按钮。It's the same as pressing the power button. 它不会向操作系统发送关闭信号。It doesn't send a shutdown signal to the operating system. 对于计划的停止操作,请始终先尝试关闭操作。For planned stop operations, always try the shutdown operation first.

当节点不再响应请求时,通常使用此操作。This action is typically used when a node no longer responds to requests.

若要运行停止操作,请打开权限提升的 PowerShell 提示符,并运行以下 cmdlet:To run the stop action, open an elevated PowerShell prompt, and run the following cmdlet:

  Stop-AzsScaleUnitNode -Location <RegionName> -Name <NodeName>

在停止操作不起作用的情况下(这种情况很少见),请重试操作,如果仍然失败,请改用 BMC Web 界面。In the unlikely case that the stop action doesn't work, retry the operation and if it fails a second time use the BMC web interface instead.

有关详细信息,请参阅 Stop-AzsScaleUnitNodeFor more information, see Stop-AzsScaleUnitNode.

开始Start

“启动”操作会打开节点。The start action turns on the node. 它的作用如同按下电源按钮。It's the same as if you press the power button.

若要运行启动操作,请打开权限提升的 PowerShell 提示符,并运行以下 cmdlet:To run the start action, open an elevated PowerShell prompt, and run the following cmdlet:

  Start-AzsScaleUnitNode -Location <RegionName> -Name <NodeName>

万一启动操作不起作用,则重试该操作。In the unlikely case that the start action doesn't work, retry the operation. 如果它再次失败,请改用 BMC Web 界面。If it fails a second time, use the BMC web interface instead.

有关详细信息,请参阅 Start-AzsScaleUnitNodeFor more information, see Start-AzsScaleUnitNode.

清空Drain

“清空”操作将所有活动工作负荷移到该特定缩放单元中的剩余节点。The drain action moves all active workloads to the remaining nodes in that particular scale unit.

在现场更换组件期间(例如,更换整个节点),通常使用此操作。This action is typically used during field replacement of parts, like the replacement of an entire node.

重要

在计划内维护时段内,确保只在已通知用户后才对节点进行清空操作。Make sure you use a drain operation on a node during a planned maintenance window, where users have been notified. 在某些情况下,活动的工作负荷可能遇到中断。Under some conditions, active workloads can experience interruptions.

若要运行清空操作,请打开权限提升的 PowerShell 提示符,并运行以下 cmdlet:To run the drain action, open an elevated PowerShell prompt, and run the following cmdlet:

  Disable-AzsScaleUnitNode -Location <RegionName> -Name <NodeName>

有关详细信息,请参阅 Disable-AzsScaleUnitNodeFor more information, see Disable-AzsScaleUnitNode.

恢复Resume

“恢复”操作恢复已禁用的节点,并将其标记为活动,可用于放置工作负荷。The resume action resumes a disabled node and marks it active for workload placement. 之前在节点上运行的工作负荷不会故障回复。Earlier workloads that were running on the node don't fail back. (如果在节点上使用清空操作,请务必关机。(If you use a drain operation on a node be sure to power off. 将节点重新开机时,系统不会将它标记为可放置工作负荷的活动状态。When you power the node back on it's not marked as active for workload placement. 准备就绪后,必须使用恢复操作将节点标记为活动。)When ready, you must use the resume action to mark the node as active.)

若要运行恢复操作,请打开权限提升的 PowerShell 提示符,并运行以下 cmdlet:To run the resume action, open an elevated PowerShell prompt, and run the following cmdlet:

  Enable-AzsScaleUnitNode -Location <RegionName> -Name <NodeName>

有关详细信息,请参阅 Enable-AzsScaleUnitNodeFor more information, see Enable-AzsScaleUnitNode.

修复Repair

注意

固件分级对于本文中所述的操作的成功至关重要。Firmware leveling is critical for the success of the operation described in this article. 当 Azure Stack Hub 自动化部署操作系统时,缺少此步骤可能会导致系统不稳定、性能降低、安全威胁或失败。Missing this step can lead to system instability, a decrease in performance, security threats, or failure when Azure Stack Hub automation deploys the operating system. 更换硬件时,请始终参阅硬件合作伙伴的文档,以确保应用的固件与 Azure Stack Hub 管理员门户中显示的 OEM 版本匹配。Always consult your hardware partner's documentation when replacing hardware to ensure the applied firmware matches the OEM Version displayed in the Azure Stack Hub administrator portal.

有关详细信息和合作伙伴文档的链接,请参阅更换硬件组件For more information and links to partner documentation, see Replace a hardware component.

硬件合作伙伴Hardware Partner 区域Region URLURL
CiscoCisco 全部All 适用于 Azure Stack Hub 的 Cisco 集成系统操作指南Cisco Integrated System for Azure Stack Hub Operations Guide

适用于 Azure Stack Hub 的 Cisco 集成系统的发行说明Release Notes for Cisco Integrated System for Azure Stack Hub
Dell EMCDell EMC 全部All Cloud for Azure Stack Hub 14G(需要帐户和登录)Cloud for Azure Stack Hub 14G (account and login required)

Cloud for Azure Stack Hub 13G(需要帐户和登录)Cloud for Azure Stack Hub 13G (account and login required)
HPEHPE 全部All HPE ProLiant for Azure Stack HubHPE ProLiant for Azure Stack Hub
LenovoLenovo 全部All ThinkAgile SXM 最佳食谱ThinkAgile SXM Best Recipes

“修复”操作可修复节点。The repair action repairs a node. 请只在出现以下情况时才使用此操作:Use it only for either of the following scenarios:

  • 更换整个节点(不管是否包含新数据磁盘)时。Full node replacement (with or without new data disks).
  • 硬件组件发生故障并予以更换之后(如果现场可更换单元 [FRU] 文档中建议更换)。After hardware component failure and replacement (if advised in the field replaceable unit [FRU] documentation).

重要

需要更换节点或单个硬件组件时,请参阅 OEM 硬件供应商的 FRU 文档,以了解具体步骤。See your OEM hardware vendor's FRU documentation for exact steps when you need to replace a node or individual hardware components. FRU 文档将指定在更换硬件组件之后是否需要运行修复操作。The FRU documentation will specify whether you need to run the repair action after replacing a hardware component.

运行修复操作时,需要指定 BMC IP 地址。When you run the repair action, you need to specify the BMC IP address.

若要运行修复操作,请打开权限提升的 PowerShell 提示符,并运行以下 cmdlet:To run the repair action, open an elevated PowerShell prompt, and run the following cmdlet:

Repair-AzsScaleUnitNode -Location <RegionName> -Name <NodeName> -BMCIPv4Address <BMCIPv4Address>

ShutdownShutdown

“关闭”操作会先将所有活动工作负荷移到同一缩放单元中的其余节点。The shutdown action first moves all active workloads to the remaining nodes in the same scale unit. 然后该操作会正常关闭缩放单元节点。Then the action gracefully shuts down the scale unit node.

启动已关闭的节点后,需要运行 恢复操作。After you start a node that was shut down, you need to run the resume action. 之前在节点上运行的工作负荷不会故障回复。Earlier workloads that were running on the node don't fail back.

如果关闭操作失败,请尝试“清空”操作,然后执行关闭操作。If the shutdown operation fails, attempt the drain operation followed by the shutdown operation.

若要运行关闭操作,请打开权限提升的 PowerShell 提示符,并运行以下 cmdlet:To run the shutdown action, open an elevated PowerShell prompt, and run the following cmdlet:

Stop-AzsScaleUnitNode -Location <RegionName> -Name <NodeName> -Shutdown

后续步骤Next steps