Scale unit node actions in Azure Stack Hub

2023-02-10

This article describes how to view the status of a scale unit. You can view the unit's nodes. You can run node actions like power on, power off, shut down, drain, resume, and repair. Typically, you use these node actions during field replacement of parts, or to help recover a node.

Important

All node actions described in this article should target one node at a time.

View the node status

In the administrator portal, you can view the status of a scale unit and its associated nodes.

To view the status of a scale unit:

On the Region management tile, select the region.
On the left, under Infrastructure resources, select Scale units.
In the results, select the scale unit.
On the left, under General, select Nodes.

View the following information:
- The list of individual nodes.
- Operational Status (see list below).
- Power Status (running or stopped).
- Server model.
- IP address of the baseboard management controller (BMC).
- Total number of cores.
- Total amount of memory.
Node actions can also raise expected alerts in the administrator portal.

status of a scale unit

Node operational states

Status	Description
Running	The node is actively participating in the scale unit.
Stopped	The node is unavailable.
Adding	The node is actively being added to the scale unit.
Repairing	The node is actively being repaired.
Maintenance	The node is paused, and no active user workload is running.
Requires Remediation	An error has been detected that requires the node to be repaired.

Azure Stack Hub shows Adding status after an operation

Azure Stack Hub may show the operational node status as Adding after an operation like drain, resume, repair, shutdown or start was executed. This can happen when the Fabric Resource Provider Role cache did not refresh after an operation.

Before applying the following steps ensure that no operation is currently in progress. Update the endpoint to match your environment.

Az modules
AzureRM modules

Open PowerShell and add your Azure Stack Hub environment. This requires Azure Stack Hub PowerShell to be installed on your computer.

Add-AzEnvironment -Name AzureStack -ARMEndpoint https://adminmanagement.local.azurestack.external
Connect-AzAccount -Environment AzureStack

Run the following command to restart the Fabric Resource Provider Role.
```
Restart-AzsInfrastructureRole -Name FabricResourceProvider
```
Validate the operational status of the impacted scale unit node changed to Running. You can use the Administrator portal or the following PowerShell command:
```
Get-AzsScaleUnitNode |ft name,scaleunitnodestatus,powerstate
```
If the node operational status is still shown as Adding continue to open a support incident.

Open PowerShell and add your Azure Stack Hub environment. This requires Azure Stack Hub PowerShell to be installed on your computer.

Add-AzureRMEnvironment -Name AzureStack -ARMEndpoint https://adminmanagement.local.azurestack.external
Add-AzureRMAccount -Environment AzureStack

Run the following command to restart the Fabric Resource Provider Role.
```
Restart-AzsInfrastructureRole -Name FabricResourceProvider
```
Validate the operational status of the impacted scale unit node changed to Running. You can use the Administrator portal or the following PowerShell command:
```
Get-AzsScaleUnitNode |ft name,scaleunitnodestatus,powerstate
```
If the node operational status is still shown as Adding continue to open a support incident.

Scale unit node actions

When you view information about a scale unit node, you can also perform node actions like:

Start and stop (depending on current power status).
Disable and resume (depending on operations status).
Repair.
Shutdown.

The operational state of the node determines which options are available.

You need to install Azure Stack Hub PowerShell modules. These cmdlets are in the Azs.Fabric.Admin module. To install or verify your installation of PowerShell for Azure Stack Hub, see Install PowerShell for Azure Stack Hub.

Stop

The Stop action turns off the node. It's the same as pressing the power button. It doesn't send a shutdown signal to the operating system. For planned stop operations, always try the shutdown operation first.

This action is typically used when a node no longer responds to requests.

To run the stop action, open an elevated PowerShell prompt, and run the following cmdlet:

  Stop-AzsScaleUnitNode -Location <RegionName> -Name <NodeName>

In the unlikely case that the stop action doesn't work, retry the operation and if it fails a second time use the BMC web interface instead.

For more information, see Stop-AzsScaleUnitNode.

Start

The start action turns on the node. It's the same as if you press the power button.

To run the start action, open an elevated PowerShell prompt, and run the following cmdlet:

  Start-AzsScaleUnitNode -Location <RegionName> -Name <NodeName>

In the unlikely case that the start action doesn't work, retry the operation. If it fails a second time, use the BMC web interface instead.

For more information, see Start-AzsScaleUnitNode.

Drain

The drain action moves all active workloads to the remaining nodes in that particular scale unit.

This action is typically used during field replacement of parts, like the replacement of an entire node.

Important

Make sure you use a drain operation on a node during a planned maintenance window, where users have been notified. Under some conditions, active workloads can experience interruptions.

To run the drain action, open an elevated PowerShell prompt, and run the following cmdlet:

  Disable-AzsScaleUnitNode -Location <RegionName> -Name <NodeName>

For more information, see Disable-AzsScaleUnitNode.

Resume

The resume action resumes a disabled node and marks it active for workload placement. Earlier workloads that were running on the node don't fail back. (If you use a drain operation on a node be sure to power off. When you power the node back on it's not marked as active for workload placement. When ready, you must use the resume action to mark the node as active.)

To run the resume action, open an elevated PowerShell prompt, and run the following cmdlet:

  Enable-AzsScaleUnitNode -Location <RegionName> -Name <NodeName>

For more information, see Enable-AzsScaleUnitNode.

Repair

Caution

Firmware leveling is critical for the success of the operation described in this article. Missing this step can lead to system instability, a decrease in performance, security threats, or failure when Azure Stack Hub automation deploys the operating system. Always consult your hardware partner's documentation when replacing hardware to ensure the applied firmware matches the OEM Version displayed in the Azure Stack Hub administrator portal.

For more information and links to partner documentation, see Replace a hardware component.

Hardware Partner	Region	URL
Cisco	All	Cisco Integrated System for Azure Stack Hub Operations Guide Release Notes for Cisco Integrated System for Azure Stack Hub
Dell EMC	All	Cloud for Azure Stack Hub 14G (account and login required) Cloud for Azure Stack Hub 13G (account and login required)
HPE	All	HPE ProLiant for Azure Stack Hub
Lenovo	All	ThinkAgile SXM Best Recipes

The repair action repairs a node. Use it only for either of the following scenarios:

Full node replacement (with or without new data disks).
After hardware component failure and replacement (if advised in the field replaceable unit [FRU] documentation).

Important

See your OEM hardware vendor's FRU documentation for exact steps when you need to replace a node or individual hardware components. The FRU documentation will specify whether you need to run the repair action after replacing a hardware component.

When you run the repair action, you need to specify the BMC IP address.

To run the repair action, open an elevated PowerShell prompt, and run the following cmdlet:

Repair-AzsScaleUnitNode -Location <RegionName> -Name <NodeName> -BMCIPv4Address <BMCIPv4Address>

Shutdown

The shutdown action first moves all active workloads to the remaining nodes in the same scale unit. Then the action gracefully shuts down the scale unit node.

After you start a node that was shut down, you need to run the resume action. Earlier workloads that were running on the node don't fail back.

If the shutdown operation fails, attempt the drain operation followed by the shutdown operation.

To run the shutdown action, open an elevated PowerShell prompt, and run the following cmdlet:

Stop-AzsScaleUnitNode -Location <RegionName> -Name <NodeName> -Shutdown

Scale unit node actions in Azure Stack Hub

View the node status

Node operational states

Azure Stack Hub shows Adding status after an operation

Scale unit node actions

Stop

Start

Drain

Resume

Repair

Shutdown

Next steps

Additional resources