监视 Azure Stack Hub 硬件组件Monitor Azure Stack Hub hardware components

Azure Stack Hub 运行状况和监视系统监视存储子系统的状态,并根据需要引发警报。The Azure Stack Hub health and monitoring system monitors the status of the storage subsystem and raises alerts as needed. 运行状况和监视系统还可以为以下硬件组件引发警报:The health and monitoring system can also raise alerts for the following hardware components:

  • 系统风扇System fans
  • 系统温度System temperature
  • 电源Power supply
  • CPUCPUs
  • 内存Memory
  • 启动驱动器Boot drives

备注

在启用此功能之前,必须向硬件合作伙伴确认它们已就绪。Before you enable this feature, you must validate with your hardware partner that they're ready. 硬件合作伙伴还会提供在基板管理控制器 (BMC) 中启用此功能的详细步骤。Your hardware partner will also provide the detailed steps for enabling this feature in the baseboard management controller (BMC).

SNMP 侦听器场景SNMP listener scenario

SNMP v3 侦听器正在 TCP 端口 162 上的所有三个 ERCS 实例上运行。An SNMP v3 listener is running on all three ERCS instances on TCP port 162. BMC 必须配置为向 Azure Stack Hub 侦听器发送 SNMP 陷阱。The BMC must be configured to send SNMP traps to the Azure Stack Hub listener. 可以通过打开区域属性视图,从管理员门户中获取三个 PEP IP。You can get the three PEP IPs from the administrator portal by opening the region properties view.

向侦听器发送陷阱要求进行身份验证,并且必须使用与访问基本 BMC 本身相同的凭据。Sending traps to the listener requires authentication and must use the same credential as accessing base BMC itself.

如果 TCP 端口 162 上的三个 ERCS 实例中的任何一个实例收到 SNMP 陷阱,将在内部匹配 OID,并引发警报。When an SNMP trap is received on any of the three ERCS instances on TCP port 162, the OID is matched internally and an alert is raised. Azure Stack Hub 运行状况和监视系统仅接受硬件合作伙伴定义的 OID。The Azure Stack Hub health and monitoring system only accepts OIDs defined by the hardware partner. 如果 OID 对于 Azure Stack Hub 是未知的,则不会将其与警报匹配。If an OID is unknown to Azure Stack Hub, it won't match it to an alert.

更换发生故障的组件后,系统会从 BMC 将一个指示状态更改的事件发送到 SNMP 侦听器。Once a faulty component is replaced, an event is sent from the BMC to the SNMP listener that indicates the state change. 然后,该警报会自动在 Azure Stack Hub 中关闭。The alert then closes automatically in Azure Stack Hub.

备注

在更换整个节点或主板后,现有警报不会自动关闭。Existing alerts do not close automatically when the entire node or motherboard is replaced. 当 BMC 失去其配置时(例如恢复出厂设置),这同样适用。The same applies when the BMC loses its configuration; for example, due to a factory reset.

后续步骤Next steps

防火墙集成Firewall integration