将外部监视解决方案与 Azure Stack Hub 集成Integrate external monitoring solution with Azure Stack Hub

要在外部监视 Azure Stack Hub 基础结构,需要监视 Azure Stack Hub 软件、物理计算机和物理网络交换机。For external monitoring of the Azure Stack Hub infrastructure, you need to monitor the Azure Stack Hub software, the physical computers, and the physical network switches. 上述每个监视区域都提供相应的方法来检索运行状况和警报信息:Each of these areas offers a method to retrieve health and alert information:

  • Azure Stack Hub 软件提供基于 REST 的 API 来检索运行状况和警报。Azure Stack Hub software offers a REST-based API to retrieve health and alerts. 软件定义的技术(如存储空间直通、存储运行状况和警报)的使用是软件监视的一部分。The use of software-defined technologies such as Storage Spaces Direct, storage health, and alerts are part of software monitoring.
  • 物理计算机可以通过基板管理控制器 (BMC) 提供运行状况和警报信息。Physical computers can make health and alert information available via the baseboard management controllers (BMCs).
  • 物理网络设备可以通过 SNMP 协议提供运行状况和警报信息。Physical network devices can make health and alert information available via the SNMP protocol.

每个 Azure Stack Hub 解决方案随附硬件生命周期主机。Each Azure Stack Hub solution ships with a hardware lifecycle host. 此主机针对物理服务器和网络设备运行原始设备制造商 (OEM) 硬件供应商的监视软件。This host runs the original equipment manufacturer (OEM) hardware vendor's monitoring software for the physical servers and network devices. 请咨询 OEM 提供商,看其监视解决方案能否与数据中心现有的监视解决方案集成。Check with your OEM provider if their monitoring solutions can integrate with existing monitoring solutions in your datacenter.

重要

使用的外部监视解决方案必须无代理。The external monitoring solution you use must be agentless. 不能在 Azure Stack Hub 组件内部安装第三方代理。You can't install third-party agents inside Azure Stack Hub components.

下图演示 Azure Stack Hub 集成系统、硬件生命周期主机、外部监视解决方案与外部票证/数据收集系统之间的流量流。The following diagram shows traffic flow between an Azure Stack Hub integrated system, the hardware lifecycle host, an external monitoring solution, and an external ticketing/data collection system.

显示 Azure Stack Hub、监视与票证解决方案之间的流量的示意图。

备注

不允许直接与物理服务器进行外部监视集成,访问控制列表 (ACL) 会主动阻止这种集成。External monitoring integration directly with physical servers isn't allowed and actively blocked by Access Control Lists (ACLs). 支持直接与物理网络设备进行外部监视集成。External monitoring integration directly with physical network devices is supported. 请咨询 OEM 提供商,了解如何启用此功能。Check with your OEM provider on how to enable this feature.

本文介绍如何将 Azure Stack Hub 与外部监视解决方案(例如 System Center Operations Manager 和 Nagios)集成。This article explains how to integrate Azure Stack Hub with external monitoring solutions such as System Center Operations Manager and Nagios. 此外,还介绍如何使用 PowerShell 或 REST API 调用以编程方式处理警报。It also includes how to work with alerts programmatically by using PowerShell or through REST API calls.

与 Operations Manager 集成Integrate with Operations Manager

可将 Operations Manager 用于 Azure Stack Hub 的外部监视。You can use Operations Manager for external monitoring of Azure Stack Hub. 借助适用于 Azure Stack Hub 的 System Center 管理包,可以使用单个 Operations Manager 实例来监视多个 Azure Stack Hub 部署。The System Center Management Pack for Azure Stack Hub enables you to monitor multiple Azure Stack Hub deployments with a single Operations Manager instance. 该管理包使用运行状况资源提供程序和更新资源提供程序 REST API 来与 Azure Stack Hub 通信。The management pack uses the health resource provider and update resource provider REST APIs to communicate with Azure Stack Hub. 如果打算绕过硬件生命周期主机上运行的 OEM 监视软件,则可以安装供应商管理包来监视物理服务器。If you plan to bypass the OEM monitoring software that's running on the hardware lifecycle host, you can install vendor management packs to monitor physical servers. 还可以使用 Operations Manager 网络设备发现来监视网络交换机。You can also use Operations Manager network device discovery to monitor network switches.

适用于 Azure Stack Hub 的管理包提供以下功能:The management pack for Azure Stack Hub provides the following capabilities:

  • 可以管理多个 Azure Stack Hub 部署。You can manage multiple Azure Stack Hub deployments.
  • 支持 Azure Active Directory (Azure AD) 和 Active Directory 联合身份验证服务 (AD FS)。There's support for Azure Active Directory (Azure AD) and Active Directory Federation Services (AD FS).
  • 可以检索和关闭警报。You can retrieve and close alerts.
  • 提供运行状况和容量仪表板。There's a health and a capacity dashboard.
  • 正在修补和更新 (P&U) 时可以执行自动维护模式检测。Includes Auto Maintenance Mode detection for when patch and update (P&U) is in progress.
  • 包含针对部署和区域的强制更新任务。Includes Force Update tasks for deployment and region.
  • 可将自定义信息添加到区域。You can add custom information to a region.
  • 支持通知和报告。Supports notification and reporting.

若要下载 System Center 管理包和关联的用户指南,请参阅下载用于 Azure Stack Hub 的 System Center 管理包To download the System Center Management Pack and the associated user guide, see Download System Center Management Pack for Azure Stack Hub. 也可直接从 Operations Manager 下载它。You can also download it directly from Operations Manager.

对于票证解决方案,可将 Operations Manager 与 System Center Service Manager 集成。For a ticketing solution, you can integrate Operations Manager with System Center Service Manager. 集成的产品连接器支持双向通信,可让你在解决 Service Manager 中的服务请求之后关闭 Azure Stack Hub 和 Operations Manager 中的警报。The integrated product connector enables bidirectional communication that allows you to close an alert in Azure Stack Hub and Operations Manager after you resolve a service request in Service Manager.

下图演示了 Azure Stack Hub 与现有 System Center 部署的集成。The following diagram shows integration of Azure Stack Hub with an existing System Center deployment. 可以进一步使用 System Center Orchestrator 或 Service Management Automation (SMA) 将 Service Manager 自动化,以便在 Azure Stack Hub 中运行操作。You can automate Service Manager further with System Center Orchestrator or Service Management Automation (SMA) to run operations in Azure Stack Hub.

演示与 OM、Service Manager 和 SMA 集成的示意图。

与 Nagios 集成Integrate with Nagios

可以安装并配置适用于 Azure Stack Hub 的 Nagios 插件。You can set up and configure the Nagios Plugin for Azure Stack Hub.

Nagios 监视插件是与合作伙伴 Cloudbase 解决方案一起开发的,根据 MIT(麻省理工学院)的宽松免费软件许可条款提供。A Nagios monitoring plugin was developed together with partner Cloudbase Solutions, which is available under the permissive free software license - MIT (Massachusetts Institute of Technology).

该插件以 Python 编写,利用运行状况资源提供程序 REST API。The plugin is written in Python and leverages the health resource provider REST API. 它提供在 Azure Stack Hub 中检索和关闭警报的基本功能。It offers basic functionality to retrieve and close alerts in Azure Stack Hub. 与 System Center 管理包一样,它可以让你添加多个 Azure Stack Hub 部署以及发送通知。Like the System Center management pack, it enables you to add multiple Azure Stack Hub deployments and to send notifications.

在版本 1.2 中,Azure Stack Hub – Nagios 插件利用 Microsoft ADAL 库,并支持使用服务主体通过机密或证书进行身份验证。With Version 1.2 the Azure Stack Hub – Nagios plugin leverages the Microsoft ADAL library and supports authentication using Service Principal with a secret or certificate. 此外,配置过程已通过单个配置文件与新的参数进行简化。Also, the configuration has been simplified using a single configuration file with new parameters. 它现在支持使用 Azure AD 和 AD FS 作为标识系统来部署 Azure Stack Hub。It now supports Azure Stack Hub deployments using Azure AD and AD FS as the identity system.

该插件适用于 Nagios 4x 和 XI。The plugin works with Nagios 4x and XI. 若要下载该插件,请参阅监视 Azure Stack Hub 警报To download the plugin, see Monitoring Azure Stack Hub Alerts. 下载站点还包含安装和配置详细信息。The download site also includes installation and configuration details.

Nagios 的要求Requirements for Nagios

  1. Nagios 的最低版本是 4.xMinimum Nagios Version is 4.x

  2. Azure Active Directory Python 库。Azure Active Directory Python library. 可以使用 Python PIP 安装该库。This library can be installed using Python PIP.

    sudo pip install adal pyyaml six
    

安装插件Install plugin

本部分介绍如何安装采用 Nagios 默认安装的 Azure Stack Hub 插件。This section describes how to install the Azure Stack Hub plugin assuming a default installation of Nagios.

插件包包含以下文件:The plugin package contains the following files:

azurestack_plugin.py
azurestack_handler.sh
samples/etc/azurestack.cfg
samples/etc/azurestack_commands.cfg
samples/etc/azurestack_contacts.cfg
samples/etc/azurestack_hosts.cfg
samples/etc/azurestack_services.cfg
  1. 将插件 azurestack_plugin.py 复制到以下目录:/usr/local/nagios/libexecCopy the plugin azurestack_plugin.py into the following directory: /usr/local/nagios/libexec.

  2. 将处理程序 azurestack_handler.sh 复制到以下目录:/usr/local/nagios/libexec/eventhandlersCopy the handler azurestack_handler.sh into the following directory: /usr/local/nagios/libexec/eventhandlers.

  3. 确保将插件文件设置为可执行文件:Make sure the plugin file is set to be executable:

    sudo cp azurestack_plugin.py <PLUGINS_DIR>
    sudo chmod +x <PLUGINS_DIR>/azurestack_plugin.py
    

配置插件Configure plugin

可在 azurestack.cfg 文件中配置以下参数。The following parameters are available to be configured in the azurestack.cfg file. 无论选择哪种身份验证模型,都需要配置以粗体显示的参数。Parameters in bold need to be configured independently from the authentication model you choose.

有关如何创建 SPN 的详细信息,请参阅使用应用标识来访问资源For more information on how to create an SPN, see Use an app identity to access resources.

参数Parameter 说明Description AuthenticationAuthentication
**External_domain_fqdn ****External_domain_fqdn ** 外部域 FQDNExternal Domain FQDN
**region: ****region: ** 区域名称Region Name
**tenant_id: ****tenant_id: ** 租户 ID*Tenant ID*
client_id:client_id: 客户端 IDClient ID 包含机密的 SPNSPN with secret
client_secret:client_secret: 客户端密码Client Password 包含机密的 SPNSPN with secret
client_cert**:client_cert**: 证书的路径Path to Certificate 包含证书的 SPNSPN with certificate
client_cert_thumbprint**:client_cert_thumbprint**: 证书指纹Certificate Thumbprint 包含证书的 SPNSPN with certificate

*使用 AD FS 的 Azure Stack Hub 部署不需要租户 ID。*Tenant ID isn't required for Azure Stack Hub deployments with AD FS.

** 客户端机密和客户端证书互斥。** Client secret and client cert are mutually exclusive.

其他配置文件包含可选的配置设置,也可以在 Nagios 中配置这些设置。The other configuration files contain optional configuration settings as they can be configured in Nagios as well.

备注

检查 azurestack_hosts.cfg 和 azurestack_services.cfg 中的位置目标。Check the location destination in azurestack_hosts.cfg and azurestack_services.cfg.

配置Configuration 说明Description
azurestack_commands.cfgazurestack_commands.cfg 处理程序配置没有更改要求Handler configuration no changes requirement
azurestack_contacts.cfgazurestack_contacts.cfg 通知设置Notification Settings
azurestack_hosts.cfgazurestack_hosts.cfg Azure Stack Hub 部署命名Azure Stack Hub Deployment Naming
azurestack_services.cfgazurestack_services.cfg 服务的配置Configuration of the Service

设置步骤Setup steps

  1. 修改配置文件。Modify the configuration file.

  2. 将修改后的配置文件复制到以下文件夹:/usr/local/nagios/etc/objectsCopy the modified configuration files into the following folder: /usr/local/nagios/etc/objects.

更新 Nagios 配置Update Nagios configuration

需要更新 Nagios 配置才能确保加载 Azure Stack Hub – Nagios 插件。The Nagios configuration needs to be updated to ensure the Azure Stack Hub – Nagios Plugin is loaded.

  1. 打开以下文件:Open the following file:

    /usr/local/nagios/etc/nagios.cfg
    
  2. 添加以下条目:Add the following entry:

    # Load the Azure Stack Hub Plugin Configuration
    cfg_file=/usr/local/Nagios/etc/objects/azurestack_contacts.cfg
    cfg_file=/usr/local/Nagios/etc/objects/azurestack_commands.cfg
    cfg_file=/usr/local/Nagios/etc/objects/azurestack_hosts.cfg
    cfg_file=/usr/local/Nagios/etc/objects/azurestack_services.cfg
    
  3. 重新加载 Nagios。Reload Nagios.

    sudo service nagios reload
    

手动关闭活动的警报Manually close active alerts

可以使用自定义通知功能在 Nagios 内部关闭活动的警报。Active alerts can be closed within Nagios using the custom notification functionality. 自定义通知必须是:The custom notification must be:

/close-alert <ALERT_GUID>

还可以运行以下命令使用终端关闭警报:An alert can also be closed using a terminal with the following command:

/usr/local/nagios/libexec/azurestack_plugin.py --config-file /usr/local/nagios/etc/objects/azurestack.cfg --action Close --alert-id <ALERT_GUID>

故障排除Troubleshooting

通过在终端中手动调用插件,对插件进行故障排除。Troubleshooting the plugin is done by calling the plugin manually in a terminal. 使用以下方法:Use the following method:

/usr/local/nagios/libexec/azurestack_plugin.py --config-file /usr/local/nagios/etc/objects/azurestack.cfg --action Monitor

使用 PowerShell 监视运行状况和警报Use PowerShell to monitor health and alerts

如果不使用 Operations Manager、Nagios 或基于 Nagios 的解决方案,可以使用 PowerShell 来启用广泛的监视解决方案,以便与 Azure Stack Hub 集成。If you're not using Operations Manager, Nagios, or a Nagios-based solution, you can use PowerShell to enable a broad range of monitoring solutions to integrate with Azure Stack Hub.

  1. 若要使用 PowerShell,请确保已针对 Azure Stack Hub 操作员环境安装并配置 PowerShellTo use PowerShell, make sure that you have PowerShell installed and configured for an Azure Stack Hub operator environment. 在可以访问资源管理器(管理员)终结点 (https://adminmanagement.[region].[External_FQDN]) 的本地计算机上安装 PowerShell。Install PowerShell on a local computer that can reach the Resource Manager (administrator) endpoint (https://adminmanagement.[region].[External_FQDN]).

  2. 以 Azure Stack Hub 操作员身份运行以下命令,以连接到 Azure Stack Hub 环境:Run the following commands to connect to the Azure Stack Hub environment as an Azure Stack Hub operator:

    Add-AzureRMEnvironment -Name "AzureStackAdmin" -ArmEndpoint https://adminmanagement.[Region].[External_FQDN] `
       -AzureKeyVaultDnsSuffix adminvault.[Region].[External_FQDN] `
       -AzureKeyVaultServiceEndpointResourceId https://adminvault.[Region].[External_FQDN]
    
    Connect-AzureRmAccount -EnvironmentName "AzureStackAdmin"
    
  3. 使用如下所示的命令来处理警报:Use commands such as the following examples to work with alerts:

     # Retrieve all alerts
     $Alerts = Get-AzsAlert
     $Alerts
    
     # Filter for active alerts
     $Active = $Alerts | Where-Object { $_.State -eq "active" }
     $Active
    
     # Close alert
     Close-AzsAlert -AlertID "ID"
    
     #Retrieve resource provider health
     $RPHealth = Get-AzsRPHealth
     $RPHealth
    
     # Retrieve infrastructure role instance health
     $FRPID = $RPHealth | Where-Object { $_.DisplayName -eq "Capacity" }
     Get-AzsRegistrationHealth -ServiceRegistrationId $FRPID.RegistrationId
    

了解详细信息Learn more

有关内置运行状况监视的信息,请参阅在 Azure Stack Hub 中监视运行状况和警报For information about built-in health monitoring, see Monitor health and alerts in Azure Stack Hub.

后续步骤Next steps

安全集成Security integration