监视 Azure IoT 中心Monitoring Azure IoT Hub

当你的关键应用程序和业务流程依赖于 Azure 资源时,你需要监视这些资源的可用性、性能和操作。When you have critical applications and business processes relying on Azure resources, you want to monitor those resources for their availability, performance, and operation. 本文介绍了 Azure IoT 中心生成的监视数据,以及如何使用 Azure Monitor 的功能在此数据的基础上进行分析和发出警报。This article describes the monitoring data generated by Azure IoT Hub and how you can use the features of Azure Monitor to analyze and alert on this data.

Monitor 概述Monitor overview

每个 IoT 中心的 Azure 门户中的“概述”页面都包括提供某些使用指标的图表,例如使用的消息数以及连接到 IoT 中心的设备数。The Overview page in the Azure portal for each IoT hub includes charts that provide some usage metrics, such as the number of messages used and the number of devices connected to the IoT hub.

IoT 中心“概述”页上的默认指标图表。

请注意,消息计数值可能会延迟 1 分钟。由于与 IoT 中心服务基础结构有关的原因,此值在刷新时有时可能会在较高和较低的值之间来回切换。Be aware that the message count value can be delayed by 1 minute, and that, for reasons having to do with the IoT Hub service infrastructure, the value can sometimes bounce between higher and lower values on refresh. 此计数器只有上一分钟应计的值会不正确。This counter should only be incorrect for values accrued over the last minute.

“概述”窗格中提供的信息很有用,但只代表提供给 IoT 中心的少量监视数据。The information presented on the Overview pane is useful, but represents only a small amount of the monitoring data that is available for an IoT hub. 某些监视数据会在你创建 IoT 中心后立即自动收集,并可供用于分析。Some monitoring data is collected automatically and is available for analysis as soon as you create your IoT hub. 你可以使用某些配置启用其他数据收集类型。You can enable additional types of data collection with some configuration.

说明是 Azure Monitor?What is Azure Monitor?

Azure IoT 中心使用 Azure Monitor 创建监视数据。Azure Monitor 是 Azure 中的一个全堆栈监视服务,提供用于监视 Azure 资源以及其他云中的资源和本地资源的整套功能。Azure IoT Hub creates monitoring data using Azure Monitor, which is a full stack monitoring service in Azure that provides a complete set of features to monitor your Azure resources in addition to resources in other clouds and on-premises.

一开始可以阅读使用 Azure Monitor 监视 Azure 资源一文,其中介绍了以下概念:Start with the article Monitoring Azure resources with Azure Monitor, which describes the following concepts:

  • 说明是 Azure Monitor?What is Azure Monitor?
  • 与监视相关的成本Costs associated with monitoring
  • 监视 Azure 中收集的数据Monitoring data collected in Azure
  • 配置数据收集Configuring data collection
  • Azure 中用于分析监视数据并就其发出警报的标准工具Standard tools in Azure for analyzing and alerting on monitoring data

本文的以下各部分介绍了为 Azure IoT 中心收集的特定数据,并提供了有关使用 Azure 工具配置数据收集和分析此数据的示例。The following sections build on this article by describing the specific data gathered for Azure IoT Hub and providing examples for configuring data collection and analyzing this data with Azure tools.

监视数据Monitoring data

Azure IoT 中心收集的监视数据与监视来自 Azure 资源的数据中所述的其他 Azure 资源的类型相同。Azure IoT Hub collects the same kinds of monitoring data as other Azure resources that are described in Monitoring data from Azure resources.

有关 Azure IoT 中心创建的指标和日志的详细信息,请参阅 Azure IoT 中心监视数据参考See Monitoring Azure IoT Hub data reference for detailed information on the metrics and logs created by Azure IoT Hub.

重要

IoT 中心服务使用 Azure Monitor 资源日志发出的事件不保证可靠或有序。The events emitted by the IoT Hub service using Azure Monitor resource logs are not guaranteed to be reliable or ordered. 某些事件可能会丢失或未按顺序传送。Some events might be lost or delivered out of order. 资源日志不一定是实时的,可能需要几分钟的时间才能将事件记录到所选的目标。Resource logs also aren't meant to be real-time, and it may take several minutes for events to be logged to your choice of destination.

收集和路由Collection and routing

平台指标和活动日志会自动收集和存储,但你可以使用诊断设置将其路由到其他位置。Platform metrics and the Activity log are collected and stored automatically, but can be routed to other locations by using a diagnostic setting.

在创建诊断设置并将其路由到一个或多个位置之前,不会收集和存储资源日志。Resource logs are not collected and stored until you create a diagnostic setting and route them to one or more locations.

指标和日志可路由到多个位置,包括:Metrics and logs can be routed to several locations including:

  • Azure Monitor 日志存储(使用关联的 Log Analytics 工作区)。The Azure Monitor Logs store via an associated Log Analytics workspace. 可以在其中使用 Log Analytics 对它们进行分析。There they can be analyzed using Log Analytics.
  • 用于存档和脱机分析的 Azure 存储Azure Storage for archiving and offline analysis
  • 事件中心终结点,外部应用程序(例如第三方 SIEM 工具)可以从中读取它们。An Event Hubs endpoint where they can be read by external applications, for example, third-party SIEM tools.

在 Azure 门户的 IoT 中心左窗格的“监视”下选择“诊断设置”,然后选择“添加诊断设置”,以便创建作用域为 IoT 中心发出的日志和平台指标的诊断设置。In Azure portal, you can select Diagnostic settings under Monitoring on the left-pane of your IoT hub followed by Add diagnostic setting to create diagnostic settings scoped to the logs and platform metrics emitted by your IoT hub.

以下屏幕截图显示了用于将资源日志类型“连接操作”和所有平台指标路由到 Log Analytics 工作区的诊断设置。The following screenshot shows a diagnostic setting for routing the resource log type Connection Operations and all platform metrics to a Log Analytics workspace.

IoT 中心的“诊断设置”窗格。

有关使用 Azure 门户、CLI 或 PowerShell 创建诊断设置的详细过程,请参阅创建诊断设置以收集 Azure 中的平台日志和指标See Create diagnostic setting to collect platform logs and metrics in Azure for the detailed process for creating a diagnostic setting using the Azure portal, CLI, or PowerShell. 创建诊断设置时,请指定要收集的日志类别。When you create a diagnostic setting, you specify which categories of logs to collect. Azure IoT 中心监视数据参考中的“资源日志”下列出了 Azure IoT 中心的类别。The categories for Azure IoT Hub are listed under Resource logs in the Monitoring Azure IoT Hub data reference.

将 IoT 中心平台指标路由到其他位置时,请注意:When routing IoT Hub platform metrics to other locations, be aware that:

  • 以下平台指标不可通过诊断设置导出:“已连接设备数(预览)”和“设备总数(预览)” 。The following platform metrics are not exportable via diagnostic settings: Connected devices (preview) and Total devices (preview).

  • 多维指标(例如,某些路由指标)当前导出为跨维度值聚合的平展单维指标。Multi-dimensional metrics, for example some routing metrics, are currently exported as flattened single dimensional metrics aggregated across dimension values. 有关更多详细信息,请参阅将平台指标导出到其他位置For more detail, see Exporting platform metrics to other locations.

分析指标Analyzing metrics

可以从“Azure Monitor”菜单中打开“指标”,使用指标资源管理器根据来自其他 Azure 服务的指标分析 Azure IoT 中心的指标 。You can analyze metrics for Azure IoT Hub with metrics from other Azure services using metrics explorer by opening Metrics from the Azure Monitor menu. 有关使用此工具的详细信息,请参阅 Azure 指标资源管理器入门See Getting started with Azure Metrics Explorer for details on using this tool.

在 Azure 门户的 IoT 中心左窗格的“监视”下选择“指标”以打开指标资源管理器。默认情况下,打开的指标资源管理器的范围为 IoT 中心发出的平台指标:In Azure portal, you can select Metrics under Monitoring on the left-pane of your IoT hub to open metrics explorer scoped, by default, to the platform metrics emitted by your IoT hub:

IoT 中心的指标资源管理器页。

有关为 Azure IoT 中心收集的平台指标的列表,请参阅 Azure IoT 中心监视数据参考中的“指标”For a list of the platform metrics collected for Azure IoT Hub, see Metrics in the Monitoring Azure IoT Hub data reference. 有关为所有 Azure 服务收集的平台指标的列表,请参阅 Azure Monitor 支持的指标For a list of the platform metrics collected for all Azure services, see Supported metrics with Azure Monitor.

对于以计数单位收集的 IoT 中心平台指标,某些聚合可能不可用或无法使用。For IoT Hub platform metrics that are collected in units of count, some aggregations may not be available or usable. 若要了解详细信息,请参阅 Azure IoT 中心监视数据参考中的“支持的聚合”To learn more, see Supported aggregations in the Monitoring Azure IoT Hub data reference.

某些 IoT 中心指标(例如路由指标)是多维的。Some IoT Hub metrics, like routing metrics, are multi-dimensional. 对于这些指标,你可以基于某个维度对图表应用筛选器拆分For these metrics, you can apply filters and splitting to your charts based on a dimension.

分析日志Analyzing logs

Azure Monitor 日志中的数据以表形式存储,每个表具有自己独有的属性集。Data in Azure Monitor Logs is stored in tables where each table has its own set of unique properties. 这些表中的数据与某个 Log Analytics 工作区相关联,并且可以在 Log Analytics 中进行查询。The data in these tables are associated with a Log Analytics workspace and can be queried in Log Analytics. 若要详细了解 Azure Monitor 日志,请参阅 Azure Monitor 文档中的 Azure Monitor 日志概述To learn more about Azure Monitor Logs, see Azure Monitor Logs overview in the Azure Monitor documentation.

若要将数据路由到 Azure Monitor 日志,你必须创建一个诊断设置,以便将资源日志或平台指标发送到 Log Analytics 工作区。To route data to Azure Monitor Logs, you must create a diagnostic setting to send resource logs or platform metrics to a Log Analytics workspace. 若要了解详细信息,请参阅收集和路由To learn more, see Collection and routing.

在 Azure 门户的 IoT 中心左窗格的“监视”下选择“日志”以执行 Log Analytics 查询。默认情况下,查询范围为 IoT 中心的 Log Analytics 日志中收集的日志和指标。In Azure portal, you can select Logs under Monitoring on the left-pane of your IoT hub to perform Log Analytics queries scoped, by default, to the logs and metrics collected in Azure Monitor Logs for your IoT hub.

IoT 中心的“日志”页。

有关 Azure Monitor 日志使用的可供 Log Analytics 查询的表的列表,请参阅 Azure IoT 中心监视数据参考中的“Azure Monitor 日志表”For a list of the tables used by Azure Monitor Logs and queryable by Log Analytics, see Azure Monitor Logs tables in the Monitoring Azure IoT Hub data reference.

Azure Monitor 中的所有资源日志都具有后跟服务特定字段的相同字段。All resource logs in Azure Monitor have the same fields followed by service-specific fields. Azure Monitor 资源日志架构概述了常见架构。The common schema is outlined in Azure Monitor resource log schema. 可以在 Azure IoT 中心监视数据参考中的“资源日志”中找到为 Azure IoT 中心收集的资源日志的架构和类别。You can find the schema and categories of resource logs collected for Azure IoT Hub in Resource logs in the Monitoring Azure IoT Hub data reference.

活动日志是 Azure 中的一种平台日志,可用于深入了解订阅级别事件。The Activity log is a platform log in Azure that provides insight into subscription-level events. 你可以单独查看它或将它路由到 Azure Monitor 日志,然后便可以在其中使用 Log Analytics 执行复杂得多的查询。You can view it independently or route it to Azure Monitor Logs, where you can do much more complex queries using Log Analytics.

将 IoT 中心平台指标路由到 Azure Monitor 日志时,请注意:When routing IoT Hub platform metrics to Azure Monitor Logs, be aware that:

  • 以下平台指标不可通过诊断设置导出:“已连接设备数(预览)”和“设备总数(预览)” 。The following platform metrics are not exportable via diagnostic settings: Connected devices (preview) and Total devices (preview).

  • 多维指标(例如,某些路由指标)当前导出为跨维度值聚合的平展单维指标。Multi-dimensional metrics, for example some routing metrics, are currently exported as flattened single dimensional metrics aggregated across dimension values. 有关更多详细信息,请参阅将平台指标导出到其他位置For more detail, see Exporting platform metrics to other locations.

有关 IoT 中心的一些常见查询,请参阅示例 Kusto 查询For some common queries with IoT Hub, see Sample Kusto queries. 有关使用 Log Analytics 查询的详细信息,请参阅 Azure Monitor 中的日志查询概述For detailed information on using Log Analytics queries, see Overview of log queries in Azure Monitor.

IoT 中心日志中的 SDK 版本SDK version in IoT Hub logs

IoT 中心资源日志中的一些操作在其 properties 对象中返回 sdkVersion 属性。Some operations in IoT Hub resource logs return an sdkVersion property in their properties object. 对于这些操作,当设备或后端应用使用 Azure IoT SDK 之一时,此属性包含有关所使用的 SDK、SDK 版本和运行 SDK 的平台的信息。For these operations, when a device or backend app is using one of the Azure IoT SDKs, this property contains information about the SDK being used, the SDK version, and the platform on which the SDK is running. 下面的示例演示了在使用 Node.js 设备 SDK 时,为 deviceConnect 操作发出的 sdkVersion 属性:"azure-iot-device/1.17.1 (node v10.16.0; Windows_NT 10.0.18363; x64)"The following example shows the sdkVersion property emitted for a deviceConnect operation when using the Node.js device SDK: "azure-iot-device/1.17.1 (node v10.16.0; Windows_NT 10.0.18363; x64)". 下面是为 .NET (C#) SDK 发出的值的示例:".NET/1.21.2 (.NET Framework 4.8.4200.0; Microsoft Windows 10.0.17763 WindowsProduct:0x00000004; X86)"Here's an example of the value emitted for the .NET (C#) SDK: ".NET/1.21.2 (.NET Framework 4.8.4200.0; Microsoft Windows 10.0.17763 WindowsProduct:0x00000004; X86)".

下表显示了用于不同 Azure IoT SDK 的 SDK 名称:The following table shows the SDK name used for different Azure IoT SDKs:

SdkVersion 属性中的 SDK 名称SDK name in sdkVersion property 语言Language
.NET.NET .NET (C#).NET (C#)
microsoft.azure.devicesmicrosoft.azure.devices .NET (C#) 服务 SDK.NET (C#) service SDK
microsoft.azure.devices.clientmicrosoft.azure.devices.client .NET (C#) 设备 SDK.NET (C#) device SDK
iothubclientiothubclient C 或 Python v1(已弃用)设备 SDKC or Python v1 (deprecated) device SDK
iothubserviceclientiothubserviceclient C 或 Python v1(已弃用)服务 SDKC or Python v1 (deprecated) service SDK
azure-iot-device-iothub-pyazure-iot-device-iothub-py Python 设备 SDKPython device SDK
azure-iot-deviceazure-iot-device Node.js 设备 SDKNode.js device SDK
azure-iothubazure-iothub Node.js 服务 SDKNode.js service SDK
com.microsoft.azure.iothub-java-clientcom.microsoft.azure.iothub-java-client Java 设备 SDKJava device SDK
com.microsoft.azure.iothub.service.sdkcom.microsoft.azure.iothub.service.sdk Java 服务 SDKJava service SDK
com.microsoft.azure.sdk.iot.iot-device-clientcom.microsoft.azure.sdk.iot.iot-device-client Java 设备 SDKJava device SDK
com.microsoft.azure.sdk.iot.iot-service-clientcom.microsoft.azure.sdk.iot.iot-service-client Java 服务 SDKJava service SDK
CC Embedded CEmbedded C
C + (OSSimplified = Azure RTOS)C + (OSSimplified = Azure RTOS) Azure RTOSAzure RTOS

对 IoT 中心资源日志执行查询时,可以提取 SDK 版本属性。You can extract the SDK version property when you perform queries against IoT Hub resource logs. 例如,下面的查询从连接操作返回的属性中提取 SDK 版本属性(和设备 ID)。For example, the following query extracts the SDK version property (and device ID) from the properties returned by Connections operations. 这两个属性将与操作时间以及设备所连接到的 IoT 中心的资源 ID 一起写入到结果中。These two properties are written to the results along with the time of the operation and the resource ID of the IoT hub that the device is connecting to.

// SDK version of devices
// List of devices and their SDK versions that connect to IoT Hub
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DEVICES" and ResourceType == "IOTHUBS"
| where Category == "Connections"
| extend parsed_json = parse_json(properties_s) 
| extend SDKVersion = tostring(parsed_json.sdkVersion) , DeviceId = tostring(parsed_json.deviceId)
| distinct DeviceId, SDKVersion, TimeGenerated, _ResourceId

示例 Kusto 查询Sample Kusto queries

重要

在 IoT 中心菜单中选择“日志”时,Log Analytics 随即打开,其查询范围设置为当前 IoT 中心。When you select Logs from the IoT hub menu, Log Analytics is opened with the query scope set to the current IoT hub. 这意味着日志查询只包含来自该资源的数据。This means that log queries will only include data from that resource. 如果希望运行的查询包含其他 IoT 中心或其他 Azure 服务的数据,请从“Azure Monitor”菜单中选择“日志”。If you want to run a query that includes data from other IoT hubs or data from other Azure services, select Logs from the Azure Monitor menu. 有关详细信息,请参阅 Azure Monitor Log Analytics 中的日志查询范围和时间范围See Log query scope and time range in Azure Monitor Log Analytics for details.

下面是可用来帮助你监视 IoT 中心的查询。Following are queries that you can use to help you monitor your IoT hub.

  • 连接性错误:确定设备连接错误。Connectivity Errors: Identify device connection errors.

    AzureDiagnostics
    | where ResourceProvider == "MICROSOFT.DEVICES" and ResourceType == "IOTHUBS"
    | where Category == "Connections" and Level == "Error"
    
  • 限制错误:确定因发出的请求最多而导致限制错误的设备。Throttling Errors: Identify devices that made the most requests resulting in throttling errors.

    AzureDiagnostics
    | where ResourceProvider == "MICROSOFT.DEVICES" and ResourceType == "IOTHUBS"
    | where ResultType == "429001"
    | extend DeviceId = tostring(parse_json(properties_s).deviceId)
    | summarize count() by DeviceId, Category, _ResourceId
    | order by count_ desc
    
  • 死终结点:按报告问题的次数以及原因来确定死的或不正常的终结点。Dead Endpoints: Identify dead or unhealthy endpoints by the number times the issue was reported, as well as the reason why.

    AzureDiagnostics
    | where ResourceProvider == "MICROSOFT.DEVICES" and ResourceType == "IOTHUBS"
    | where Category == "Routes" and OperationName in ("endpointDead", "endpointUnhealthy")
    | extend parsed_json = parse_json(properties_s)
    | extend Endpoint = tostring(parsed_json.endpointName), Reason = tostring(parsed_json.details) 
    | summarize count() by Endpoint, OperationName, Reason, _ResourceId
    | order by count_ desc
    
  • 错误摘要:所有操作的按类型的错误计数。Error summary: Count of errors across all operations by type.

    AzureDiagnostics
    | where ResourceProvider == "MICROSOFT.DEVICES" and ResourceType == "IOTHUBS"
    | where Level == "Error"
    | summarize count() by ResultType, ResultDescription, Category, _ResourceId
    
  • 最近连接的设备:IoT 中心在指定时间段内看到其已连接的设备的列表。Recently connected devices: List of devices that IoT Hub saw connect in the specified time period.

    AzureDiagnostics
    | where ResourceProvider == "MICROSOFT.DEVICES" and ResourceType == "IOTHUBS"
    | where Category == "Connections" and OperationName == "deviceConnect"
    | extend DeviceId = tostring(parse_json(properties_s).deviceId)
    | summarize max(TimeGenerated) by DeviceId, _ResourceId
    
  • 设备的 SDK 版本:用于设备连接或设备到云孪生操作的设备及其 SDK 版本的列表。SDK version of devices: List of devices and their SDK versions for device connections or device to cloud twin operations.

    AzureDiagnostics
    | where ResourceProvider == "MICROSOFT.DEVICES" and ResourceType == "IOTHUBS"
    | where Category == "Connections" or Category == "D2CTwinOperations"
    | extend parsed_json = parse_json(properties_s)
    | extend SDKVersion = tostring(parsed_json.sdkVersion) , DeviceId = tostring(parsed_json.deviceId)
    | distinct DeviceId, SDKVersion, TimeGenerated, _ResourceId
    

从 Azure 事件中心读取日志Read logs from Azure Event Hubs

通过诊断设置来设置事件日志记录后,可以创建应用程序以读出日志,从而可以根据日志中的信息采取措施。After you set up event logging through diagnostics settings, you can create applications that read out the logs so that you can take action based on the information in them. 以下示例代码从事件中心检索日志:This sample code retrieves logs from an event hub:

class Program
{ 
    static string connectionString = "{your AMS eventhub endpoint connection string}";
    static string monitoringEndpointName = "{your AMS event hub endpoint name}";
    static EventHubClient eventHubClient;
    //This is the Diagnostic Settings schema
    class AzureMonitorDiagnosticLog
    {
        string time { get; set; }
        string resourceId { get; set; }
        string operationName { get; set; }
        string category { get; set; }
        string level { get; set; }
        string resultType { get; set; }
        string resultDescription { get; set; }
        string durationMs { get; set; }
        string callerIpAddress { get; set; }
        string correlationId { get; set; }
        string identity { get; set; }
        string location { get; set; }
        Dictionary<string, string> properties { get; set; }
    };

    static void Main(string[] args)
    {
        Console.WriteLine("Monitoring. Press Enter key to exit.\n");
        eventHubClient = EventHubClient.CreateFromConnectionString(connectionString, monitoringEndpointName);
        var d2cPartitions = eventHubClient.GetRuntimeInformationAsync().PartitionIds;
        CancellationTokenSource cts = new CancellationTokenSource();
        var tasks = new List<Task>();
        foreach (string partition in d2cPartitions)
        {
            tasks.Add(ReceiveMessagesFromDeviceAsync(partition, cts.Token));
        }
        Console.ReadLine();
        Console.WriteLine("Exiting...");
        cts.Cancel();
        Task.WaitAll(tasks.ToArray());
    }

    private static async Task ReceiveMessagesFromDeviceAsync(string partition, CancellationToken ct)
    {
        var eventHubReceiver = eventHubClient.GetDefaultConsumerGroup().CreateReceiver(partition, DateTime.UtcNow);
        while (true)
        {
            if (ct.IsCancellationRequested)
            {
                await eventHubReceiver.CloseAsync();
                break;
            }
            EventData eventData = await eventHubReceiver.ReceiveAsync(new TimeSpan(0,0,10));
            if (eventData != null)
            {
                string data = Encoding.UTF8.GetString(eventData.GetBytes());
                Console.WriteLine("Message received. Partition: {0} Data: '{1}'", partition, data);
                var deserializer = new JavaScriptSerializer();
                //deserialize json data to azure monitor object
                AzureMonitorDiagnosticLog message = new JavaScriptSerializer().Deserialize<AzureMonitorDiagnosticLog>(result);
            }
        }
    }
}

警报Alerts

在监视数据中发现重要情况时,Azure Monitor 警报会主动通知你。Azure Monitor alerts proactively notify you when important conditions are found in your monitoring data. 有了警报,你就可以在客户注意到你的系统中的问题之前确定和解决它们。They allow you to identify and address issues in your system before your customers notice them. 可以在指标日志活动日志上设置警报。You can set alerts on metrics, logs, and the activity log. 不同类型的警报各有优缺点。Different types of alerts have benefits and drawbacks.

当基于平台指标创建警报规则时,请注意,对于以计数单位收集的 IoT 中心平台指标,某些聚合可能不可用或无法使用。When creating an alert rule based on platform metrics, be aware that for IoT Hub platform metrics that are collected in units of count, some aggregations may not be available or usable. 若要了解详细信息,请参阅 Azure IoT 中心监视数据参考中的“支持的聚合”To learn more, see Supported aggregations in the Monitoring Azure IoT Hub data reference.

后续步骤Next steps