监视 Azure IoT 中心的运行状况并快速诊断问题Monitor the health of Azure IoT Hub and diagnose problems quickly

实施 Azure IoT 中心的企业期望其资源具有可靠的性能。Businesses that implement Azure IoT Hub expect reliable performance from their resources. 为了帮助你密切监视自己的操作,IoT 中心已与 Azure Monitor and Azure Resource Health 完全集成。To help you maintain a close watch on your operations, IoT Hub is fully integrated with Azure Monitor and Azure Resource Health. 这两个服务相辅相成,提供所需的数据来让 IoT 解决方案保持正常的运行。These two services work to provide you with the data you need to keep your IoT solutions up and running in a healthy state.

Azure Monitor 是监视所有 Azure 服务并记录其日志的单一源。Azure Monitor is a single source of monitoring and logging for all your Azure services. 可将 Azure Monitor 生成的诊断日志发送到 Azure Monitor 日志、事件中心或 Azure 存储进行自定义处理。You can send the diagnostic logs that Azure Monitor generates to Azure Monitor logs, Event Hubs, or Azure Storage for custom processing. 借助 Azure Monitor 的指标和诊断设置,可以洞察资源的性能。Azure Monitor's metrics and diagnostics settings give you visibility into the performance of your resources. 请继续阅读本文,了解如何对 IoT 中心使用 Azure MonitorContinue reading this article to learn how to Use Azure Monitor with your IoT hub.

Important

IoT 中心服务使用 Azure Monitor 诊断日志发出的事件不保证可靠或有序。The events emitted by the IoT Hub service using Azure Monitor diagnostic logs are not guaranteed to be reliable or ordered. 某些事件可能会丢失或未按顺序传送。Some events might be lost or delivered out of order. 诊断日志也不是实时的,可能需要几分钟的时间才能将事件记录到所选的目标。Diagnostic logs also aren't meant to be real-time, and it may take several minutes for events to be logged to your choice of destination.

Azure 资源运行状况可以帮助你在 Azure 问题影响资源时进行诊断和获取支持。Azure Resource Health helps you diagnose and get support when an Azure issue impacts your resources. 仪表板提供每个 IoT 中心的当前和过去的运行状态。A dashboard provides current and past health status for each of your IoT hubs. 继续阅读到本文底部的部分,了解如何对 IoT 中心使用 Azure 资源运行状况Continue to the section at the bottom of this article to learn how to Use Azure Resource Health with your IoT hub.

IoT 中心还提供了其自己的指标,可使用这些指标了解 IoT 资源的状态。IoT Hub also provides its own metrics that you can use to understand the state of your IoT resources. 若要了解详细信息,请参阅了解 IoT 中心指标To learn more, see Understand IoT Hub metrics.

使用 Azure MonitorUse Azure Monitor

Azure Monitor 提供 Azure 资源的诊断信息,这意味着,你可以监视在 IoT 中心内部发生的操作。Azure Monitor provides diagnostics information for Azure resources, which means that you can monitor operations that take place within your IoT hub.

Azure Monitor 的诊断设置会取代 IoT 中心操作监视功能。Azure Monitor's diagnostics settings replaces the IoT Hub operations monitor. 如果当前正在使用操作监视,应迁移工作流。If you currently use operations monitoring, you should migrate your workflows. 有关详细信息,请参阅从操作监视迁移到诊断设置For more information, see Migrate from operations monitoring to diagnostics settings.

若要详细了解 Azure Monitor 监视的具体指标和事件,请参阅 Azure Monitor 支持的指标and Supported services, schemas, and categories for Azure Diagnostic LogsTo learn more about the specific metrics and events that Azure Monitor watches, see Supported metrics with Azure Monitor and Supported services, schemas, and categories for Azure Diagnostic Logs.

通过诊断设置启用日志记录Enable logging with diagnostics settings

  1. 登录 Azure 门户,并导航到 IoT 中心。Sign in to the Azure portal and navigate to your IoT Hub.

  2. 选择“诊断设置”。Select Diagnostics settings.

  3. 选择“启用诊断”。Select Turn on diagnostics.

    启用诊断

  4. 为诊断设置提供名称。Give the diagnostic settings a name.

  5. 选择希望将日志发送到的目标。Choose where you want to send the logs. 可选择以下三个选项的任意组合:You can select any combination of the three options:

    • 存档到存储帐户Archive to a storage account
    • 流式传输到事件中心Stream to an event hub
    • 发送到 Log AnalyticsSend to Log Analytics
  6. 选择要监视的操作,并为这些操作启用日志。Choose which operations you want to monitor, and enable logs for those operations. 诊断设置可以报告的操作如下:The operations that diagnostic settings can report on are:

    • 连接Connections
    • 设备遥测Device telemetry
    • 云到设备的消息Cloud-to-device messages
    • 设备标识操作Device identity operations
    • 文件上传File uploads
    • 消息路由Message routing
    • 云到设备孪生操作Cloud-to-device twin operations
    • 设备到云孪生操作Device-to-cloud twin operations
    • 孪生操作Twin operations
    • 作业操作Job operations
    • 直接方法Direct methods
  7. 保存新设置。Save the new settings.

如果想要通过 PowerShell 打开诊断设置,请使用以下代码:If you want to turn on diagnostics settings with PowerShell, use the following code:

Connect-AzureRmAccount -EnvironmentName AzureChinaCloud
Select-AzureRmSubscription -SubscriptionName <subscription that includes your IoT Hub>
Set-AzureRmDiagnosticSetting -ResourceId <your resource Id> -ServiceBusRuleId <your service bus rule Id> -Enabled $true

新设置在大约 10 分钟后生效。New settings take effect in about 10 minutes. 在此之后,日志将出现在“诊断设置”边栏选项卡上配置的存档目标中。After that, logs appear in the configured archival target on the Diagnostics settings blade. 有关配置诊断的详细信息,请参阅从 Azure 资源收集和使用日志数据For more information about configuring diagnostics, see Collect and consume log data from your azure resources.

了解日志Understand the logs

Azure Monitor 跟踪 IoT 中心内发生的不同操作。Azure Monitor tracks different operations that occur in IoT Hub. 每个类别都有一个定义如何报告该类别中的事件的架构。Each category has a schema that defines how events in that category are reported.

连接Connections

连接类别跟踪设备连接,并断开事件与 IoT 中心和错误的连接。The connections category tracks device connect and disconnect events from an IoT hub as well as errors. 此类别用于识别未经授权的连接尝试或者在失去与设备的连接时发出警报。This category is useful for identifying unauthorized connection attempts and or alerting when you lose connection to devices.

Note

若要获得设备的可靠连接状态,请检查设备检测信号For reliable connection status of devices check Device heartbeat.

{
    "records": 
    [
        {
            "time": " UTC timestamp",
            "resourceId": "Resource Id",
            "operationName": "deviceConnect",
            "category": "Connections",
            "level": "Information",
            "properties": "{\"deviceId\":\"<deviceId>\",\"protocol\":\"<protocol>\",\"authType\":\"{\\\"scope\\\":\\\"device\\\",\\\"type\\\":\\\"sas\\\",\\\"issuer\\\":\\\"iothub\\\",\\\"acceptingIpFilterRule\\\":null}\",\"maskedIpAddress\":\"<maskedIpAddress>\"}",
            "location": "Resource location"
        }
    ]
}

云到设备的命令Cloud-to-device commands

云到设备的命令类别跟踪在 IoT 中心发生且与云到设备的消息管道相关的错误。The cloud-to-device commands category tracks errors that occur at the IoT hub and are related to the cloud-to-device message pipeline. 此类别包括在以下情况下发生的错误:This category includes errors that occur from:

  • 发送云到设备消息时(例如“未经授权的发件人”错误),Sending cloud-to-device messages (like unauthorized sender errors),
  • 接收云到设备消息时(例如“超出传递计数”错误),以及Receiving cloud-to-device messages (like delivery count exceeded errors), and
  • 接收云到设备消息反馈时(例如“反馈已过期”错误)。Receiving cloud-to-device message feedback (like feedback expired errors).

此类别不捕捉当云到设备消息已成功传递但设备未正确进行处理时出现的错误。This category does not catch errors when the cloud-to-device message is delivered successfully but then improperly handled by the device.

{
    "records": 
    [
        {
            "time": " UTC timestamp",
            "resourceId": "Resource Id",
            "operationName": "messageExpired",
            "category": "C2DCommands",
            "level": "Error",
            "resultType": "Event status",
            "resultDescription": "MessageDescription",
            "properties": "{\"deviceId\":\"<deviceId>\",\"messageId\":\"<messageId>\",\"messageSizeInBytes\":\"<messageSize>\",\"protocol\":\"Amqp\",\"deliveryAcknowledgement\":\"<None, NegativeOnly, PositiveOnly, Full>\",\"deliveryCount\":\"0\",\"expiryTime\":\"<timestamp>\",\"timeInSystem\":\"<timeInSystem>\",\"ttl\":<ttl>, \"EventProcessedUtcTime\":\"<UTC timestamp>\",\"EventEnqueuedUtcTime\":\"<UTC timestamp>\", \"maskedIpAddress\": \"<maskedIpAddress>\", \"statusCode\": \"4XX\"}",
            "location": "Resource location"
        }
    ]
}

设备标识操作Device identity operations

设备标识操作类别跟踪你尝试在其 IoT 中心的标识注册表中创建、更新或删除条目时所发生的错误。The device identity operations category tracks errors that occur when you attempt to create, update, or delete an entry in your IoT hub's identity registry. 预配方案就很适合跟踪此类别。Tracking this category is useful for provisioning scenarios.

{
    "records": 
    [
        {
            "time": "UTC timestamp",
            "resourceId": "Resource Id",
            "operationName": "get",
            "category": "DeviceIdentityOperations",
            "level": "Error",    
            "resultType": "Event status",
            "resultDescription": "MessageDescription",
            "properties": "{\"maskedIpAddress\":\"<maskedIpAddress>\",\"deviceId\":\"<deviceId>\", \"statusCode\":\"4XX\"}",
            "location": "Resource location"
        }
    ]
}

路由Routes

消息路由类别跟踪消息路由评估期间发生的错误以及 IoT 中心感知到的终结点运行状况。The message routing category tracks errors that occur during message route evaluation and endpoint health as perceived by IoT Hub. 此类别包括诸如下列项的事件:This category includes events such as:

  • 规则评估结果为“未定义”,A rule evaluates to "undefined",
  • IoT 中心将某个终结点标记为死终结点,或者IoT Hub marks an endpoint as dead, or
  • 从终结点收到的任何错误。Any errors received from an endpoint.

此类别不包含有关消息本身的特定错误(例如设备限制错误),这些错误在“设备遥测”类别下报告。This category does not include specific errors about the messages themselves (like device throttling errors), which are reported under the "device telemetry" category.

{
    "records": 
    [
        {
            "time": "UTC timestamp",
            "resourceId": "Resource Id",
            "operationName": "endpointUnhealthy",
            "category": "Routes",
            "level": "Error",
            "properties": "{\"deviceId\": \"<deviceId>\",\"endpointName\":\"<endpointName>\",\"messageId\":<messageId>,\"details\":\"<errorDetails>\",\"routeName\": \"<routeName>\"}",
            "location": "Resource location"
        }
    ]
}

设备遥测Device telemetry

设备遥测类别跟踪在 IoT 中心发生且与遥测管道相关的错误。The device telemetry category tracks errors that occur at the IoT hub and are related to the telemetry pipeline. 此类别包括发送遥测事件(例如限制)和接收遥测事件(例如未经授权的读取者)时发生的错误。This category includes errors that occur when sending telemetry events (such as throttling) and receiving telemetry events (such as unauthorized reader). 此类别无法捕捉设备本身运行的代码所造成的错误。This category cannot catch errors caused by code running on the device itself.

{
    "records": 
    [
        {
            "time": "UTC timestamp",
            "resourceId": "Resource Id",
            "operationName": "ingress",
            "category": "DeviceTelemetry",
            "level": "Error",
            "resultType": "Event status",
            "resultDescription": "MessageDescription",
            "properties": "{\"deviceId\":\"<deviceId>\",\"batching\":\"0\",\"messageSizeInBytes\":\"<messageSizeInBytes>\",\"EventProcessedUtcTime\":\"<UTC timestamp>\",\"EventEnqueuedUtcTime\":\"<UTC timestamp>\",\"partitionId\":\"1\"}", 
            "location": "Resource location"
        }
    ]
}

文件上传操作File upload operations

文件上传类别跟踪在 IoT 中心发生且与文件上传功能相关的错误。The file upload category tracks errors that occur at the IoT hub and are related to file upload functionality. 此类别包括:This category includes:

  • SAS URI 发生的错误,例如它在设备就上传完毕通知中心之前到期。Errors that occur with the SAS URI, such as when it expires before a device notifies the hub of a completed upload.
  • 设备报告的失败上传。Failed uploads reported by the device.
  • 创建 IoT 中心通知消息期间在存储中找不到文件时发生的错误。Errors that occur when a file is not found in storage during IoT Hub notification message creation.

此类别不能捕获在设备将文件上传到存储时直接发生的错误。This category cannot catch errors that directly occur while the device is uploading a file to storage.

{
    "records": 
    [
        {
            "time": "UTC timestamp",
            "resourceId": "Resource Id",
            "operationName": "ingress",
            "category": "FileUploadOperations",
            "level": "Error",
            "resultType": "Event status",
            "resultDescription": "MessageDescription",
            "durationMs": "1",
            "properties": "{\"deviceId\":\"<deviceId>\",\"protocol\":\"<protocol>\",\"authType\":\"{\\\"scope\\\":\\\"device\\\",\\\"type\\\":\\\"sas\\\",\\\"issuer\\\":\\\"iothub\\\",\\\"acceptingIpFilterRule\\\":null}\",\"blobUri\":\"http//bloburi.com\"}",
            "location": "Resource location"
        }
    ]
}

云到设备孪生操作Cloud-to-device twin operations

云到设备孪生操作类别跟踪设备孪生上服务发起的事件。The cloud-to-device twin operations category tracks service-initiated events on device twins. 这些操作可能获取孪生、更新或替换标记,以及更新或替换所需属性。These operations can include get twin, update or replace tags, and update or replace desired properties.

{
    "records": 
    [
        {
            "time": "UTC timestamp",
            "resourceId": "Resource Id",
            "operationName": "read",
            "category": "C2DTwinOperations",
            "level": "Information",
            "durationMs": "1",
            "properties": "{\"deviceId\":\"<deviceId>\",\"sdkVersion\":\"<sdkVersion>\",\"messageSize\":\"<messageSize>\"}",
            "location": "Resource location"
        }
    ]
}

设备到云孪生操作Device-to-cloud twin operations

设备到云孪生操作类别跟踪设备孪生上设备发起的事件。The device-to-cloud twin operations category tracks device-initiated events on device twins. 这些操作可能包括获取孪生、更新报告属性和订阅所需属性。These operations can include get twin, update reported properties, and subscribe to desired properties.

{
    "records": 
    [
        {
            "time": "UTC timestamp",
            "resourceId": "Resource Id",
            "operationName": "update",
            "category": "D2CTwinOperations",
            "level": "Information",
            "durationMs": "1",
            "properties": "{\"deviceId\":\"<deviceId>\",\"protocol\":\"<protocol>\",\"authenticationType\":\"{\\\"scope\\\":\\\"device\\\",\\\"type\\\":\\\"sas\\\",\\\"issuer\\\":\\\"iothub\\\",\\\"acceptingIpFilterRule\\\":null}\"}",
            "location": "Resource location"
        }
    ]
}

孪生查询Twin queries

孪生查询类别报告在云中针对设备孪生发起的查询请求。The twin queries category reports on query requests for device twins that are initiated in the cloud.

{
    "records": 
    [
        {
            "time": "UTC timestamp",
            "resourceId": "Resource Id",
            "operationName": "query",
            "category": "TwinQueries",
            "level": "Information",
            "durationMs": "1",
            "properties": "{\"query\":\"<twin query>\",\"sdkVersion\":\"<sdkVersion>\",\"messageSize\":\"<messageSize>\",\"pageSize\":\"<pageSize>\", \"continuation\":\"<true, false>\", \"resultSize\":\"<resultSize>\"}",
            "location": "Resource location"
        }
    ]
}

作业操作Jobs operations

作业操作类别报告在多个设备上更新设备孪生或调用直接方法的作业请求。The jobs operations category reports on job requests to update device twins or invoke direct methods on multiple devices. 这些请求在云中发起。These requests are initiated in the cloud.

{
    "records": 
    [
        {
            "time": "UTC timestamp",
            "resourceId": "Resource Id",
            "operationName": "jobCompleted",
            "category": "JobsOperations",
            "level": "Information",
            "durationMs": "1",
            "properties": "{\"jobId\":\"<jobId>\", \"sdkVersion\": \"<sdkVersion>\",\"messageSize\": <messageSize>,\"filter\":\"DeviceId IN ['1414ded9-b445-414d-89b9-e48e8c6285d5']\",\"startTimeUtc\":\"Wednesday, September 13, 2017\",\"duration\":\"0\"}",
            "location": "Resource location"
        }
    ]
}

直接方法Direct Methods

直接方法类别跟踪发送到单个设备的请求-响应交互。The direct methods category tracks request-response interactions sent to individual devices. 这些请求在云中发起。These requests are initiated in the cloud.

{
    "records": 
    [
        {
            "time": "UTC timestamp",
            "resourceId": "Resource Id",
            "operationName": "send",
            "category": "DirectMethods",
            "level": "Information",
            "durationMs": "1",
            "properties": "{\"deviceId\":<messageSize>, \"RequestSize\": 1, \"ResponseSize\": 1, \"sdkVersion\": \"2017-07-11\"}",
            "location": "Resource location"
        }
    ]
}

从 Azure 事件中心读取日志Read logs from Azure Event Hubs

通过诊断设置来设置事件日志记录后,可以创建应用程序以读出日志,从而可以根据日志中的信息采取措施。After you set up event logging through diagnostics settings, you can create applications that read out the logs so that you can take action based on the information in them. 以下示例代码从事件中心检索日志:This sample code retrieves logs from an event hub:

class Program 
{ 
    static string connectionString = "{your AMS eventhub endpoint connection string}"; 
    static string monitoringEndpointName = "{your AMS event hub endpoint name}"; 
    static EventHubClient eventHubClient; 
//This is the Diagnostic Settings schema 
    class AzureMonitorDiagnosticLog 
    { 
        string time { get; set; } 
        string resourceId { get; set; } 
        string operationName { get; set; } 
        string category { get; set; } 
        string level { get; set; } 
        string resultType { get; set; } 
        string resultDescription { get; set; } 
        string durationMs { get; set; } 
        string callerIpAddress { get; set; } 
        string correlationId { get; set; } 
        string identity { get; set; } 
        string location { get; set; } 
        Dictionary<string, string> properties { get; set; } 
    }; 
    static void Main(string[] args) 
    { 
        Console.WriteLine("Monitoring. Press Enter key to exit.\n"); 
        eventHubClient = EventHubClient.CreateFromConnectionString(connectionString, monitoringEndpointName); 
        var d2cPartitions = eventHubClient.GetRuntimeInformationAsync().PartitionIds; 
        CancellationTokenSource cts = new CancellationTokenSource(); 
        var tasks = new List<Task>(); 
        foreach (string partition in d2cPartitions) 
        { 
            tasks.Add(ReceiveMessagesFromDeviceAsync(partition, cts.Token)); 
        } 
        Console.ReadLine(); 
        Console.WriteLine("Exiting..."); 
        cts.Cancel(); 
        Task.WaitAll(tasks.ToArray()); 
    } 
    private static async Task ReceiveMessagesFromDeviceAsync(string partition, CancellationToken ct) 
    { 
        var eventHubReceiver = eventHubClient.GetDefaultConsumerGroup().CreateReceiver(partition, DateTime.UtcNow); 
        while (true) 
        { 
            if (ct.IsCancellationRequested) 
            { 
                await eventHubReceiver.CloseAsync(); 
                break; 
            } 
            EventData eventData = await eventHubReceiver.ReceiveAsync(new TimeSpan(0,0,10)); 
            if (eventData != null) 
            { 
                string data = Encoding.UTF8.GetString(eventData.GetBytes()); 
                Console.WriteLine("Message received. Partition: {0} Data: '{1}'", partition, data); 
                var deserializer = new JavaScriptSerializer(); 
                //deserialize json data to azure monitor object 
                AzureMonitorDiagnosticLog message = new JavaScriptSerializer().Deserialize<AzureMonitorDiagnosticLog>(result); 
 
            } 
        } 
    } 
} 

使用 Azure 资源运行状况Use Azure Resource Health

使用 Azure 资源运行状况可以监视 IoT 中心是否已启动并正在运行。Use Azure Resource Health to monitor whether your IoT hub is up and running. 此外,还可以了解是否发生了影响 IoT 中心运行状况的区域性服务中断。You can also learn whether a regional outage is impacting the health of your IoT hub. 若要了解有关 Azure IoT 中心运行状态的具体详细信息,我们建议使用 Azure MonitorTo understand specific details about the health state of your Azure IoT Hub, we recommend that you Use Azure Monitor.

Azure IoT 中心指示区域级别的运行状况。Azure IoT Hub indicates health at a regional level. 如果区域性服务中断影响了你的 IoT 中心,则运行状态显示为“未知”。 If a regional outage impacts your IoT hub, the health status shows as Unknown. 若要了解详细信息,请参阅 Azure 资源运行状况中的资源类型和运行状况检查To learn more, see Resource types and health checks in Azure resource health.

若要检查 IoT 中心的运行状况,请遵循以下步骤:To check the health of your IoT hubs, follow these steps:

  1. 登录到 Azure 门户Sign in to the Azure portal.
  2. 导航到“服务运行状况” > “资源运行状况”。 Navigate to Service Health > Resource health.
  3. 从下拉列表框中,选择你的订阅,然后选择“IoT 中心” 作为资源类型。From the drop-down boxes, select your subscription then select IoT Hub as the resource type.

若要详细了解如何解释运行状况数据,请参阅 Azure 资源运行状况概述To learn more about how to interpret health data, see Azure resource health overview

后续步骤Next steps