教程:通过 IoT 中心设置和使用指标和诊断日志Tutorial: Set up and use metrics and diagnostic logs with an IoT hub

如果有在生产环境中运行的 IoT 中心解决方案,则需设置一些指标并启用诊断日志。If you have an IoT Hub solution running in production, you want to set up some metrics and enable diagnostic logs. 然后,在出现问题的情况下,可以查看数据,以便诊断问题并更快进行修复。Then if a problem occurs, you have data to look at that will help you diagnose the problem and fix it more quickly. 本文介绍如何启用诊断日志,以及如何检查其中记录的错误。In this article, you'll see how to enable the diagnostic logs, and how to check them for errors. 此外还介绍如何设置一些监视指标,以及如何设置在指标达到特定阈值时触发的警报。You'll also set up some metrics to watch, and alerts that fire when the metrics hit a certain boundary. 例如,可以在发送的遥测消息数超出特定限制时,或者所使用的消息数接近 IoT 中心每天允许的消息配额时,让系统给你发送电子邮件。For example, you could have an e-mail sent to you when the number of telemetry messages sent exceeds a specific boundary, or when the number of messages used gets close to the quota of messages allowed per day for the IoT Hub.

以加油站为例,油泵是可以与 IoT 中心通信的 IoT 设备。An example use case is a gas station where the pumps are IoT devices that send communicate with an IoT hub. 在验证信用卡之后,系统会将最终的交易写入数据存储。Credit cards are validated, and the final transaction is written to a data store. 在 IoT 设备停止连接到中心并停止发送消息的情况下,如果看不到正在进行的操作,则更加难以解决问题。If the IoT devices stop connecting to the hub and sending messages, it is much more difficult to fix if you have no visibility into what's going on.

本教程使用 IoT 中心路由中的 Azure 示例将消息发送到 IoT 中心。This tutorial uses the Azure sample from the IoT Hub Routing to send messages to the IoT hub.

将在本教程中执行以下任务:In this tutorial, you perform the following tasks:

  • 使用 Azure CLI 创建 IoT 中心、模拟设备和存储帐户。Using Azure CLI, create an IoT hub, a simulated device, and a storage account.
  • 启用诊断日志。Enable diagnostic logs.
  • 启用指标。Enable metrics.
  • 针对这些指标设置警报。Set up alerts for those metrics.
  • 下载并运行一个应用,用于模拟 IoT 设备将消息发送到中心的情形。Download and run an app that simulates an IoT device sending messages to the hub.
  • 运行该应用,直至警报开始触发。Run the app until the alerts begin to fire.
  • 查看指标结果并检查诊断日志。View the metrics results and check the diagnostic logs.

先决条件Prerequisites

  • Azure 订阅。An Azure subscription. 如果没有 Azure 订阅,可在开始前创建一个试用帐户If you don't have an Azure subscription, create a trial account before you begin.

  • 安装 Visual StudioInstall Visual Studio.

  • 一个能够接收邮件的电子邮件帐户。An email account capable of receiving mail.

  • 确保已在防火墙中打开端口 8883。Make sure that port 8883 is open in your firewall. 本教程中的设备示例使用 MQTT 协议,该协议通过端口 8883 进行通信。The device sample in this tutorial uses MQTT protocol, which communicates over port 8883. 在某些公司和教育网络环境中,此端口可能被阻止。This port may be blocked in some corporate and educational network environments. 有关解决此问题的更多信息和方法,请参阅连接到 IoT 中心(MQTT)For more information and ways to work around this issue, see Connecting to IoT Hub (MQTT).

设置资源Set up resources

若要完成本教程,需要 IoT 中心、存储帐户和模拟 IoT 设备。For this tutorial, you need an IoT hub, a storage account, and a simulated IoT device. 这些资源可以通过 Azure CLI 或 Azure PowerShell 创建。These resources can be created using Azure CLI or Azure PowerShell. 为所有资源使用相同的资源组和位置。Use the same resource group and location for all of the resources. 在本教程结束后,可以通过删除资源组一次性删除所有资源。Then at the end, you can remove everything in one step by deleting the resource group.

下面是所要执行的步骤。These are the required steps.

  1. 创建资源组Create a resource group.

  2. 创建 IoT 中心。Create an IoT hub.

  3. 使用 Standard_LRS 副本创建标准 V1 存储帐户。Create a standard V1 storage account with Standard_LRS replication.

  4. 为发送消息到中心的模拟设备创建设备标识。Create a device identity for the simulated device that sends messages to your hub. 保存测试阶段的密钥。Save the key for the testing phase.

使用 Azure CLI 设置资源Set up resources using Azure CLI

复制此脚本并将其粘贴。Copy and paste this script. 在已登录的情况下,该脚本会逐行运行。Assuming you are already logged in, it runs the script one line at a time. 新资源在资源组 ContosoResources 中创建。The new resources are created in the resource group ContosoResources.

必须全局唯一的变量已使用 $RANDOM 串联起来。The variables that must be globally unique have $RANDOM concatenated to them. 运行脚本时,如果已设置相应的变量,则会生成一个随机数字字符串,并将其串联到固定字符串的末尾,使其保持唯一。When the script is run and the variables are set, a random numeric string is generated and concatenated to the end of the fixed string, making it unique.


# This is the IOT Extension for Azure CLI.
# You only need to install this the first time.
# You need it to create the device identity. 
az extension add --name azure-iot

# Set the values for the resource names that don't have to be globally unique.
# The resources that have to have unique names are named in the script below
#   with a random number concatenated to the name so you can probably just
#   run this script, and it will work with no conflicts.
location=chinaeast
resourceGroup=ContosoResources
iotDeviceName=Contoso-Test-Device 

# Create the resource group to be used
#   for all the resources for this tutorial.
az group create --name $resourceGroup \
    --location $location

# The IoT hub name must be globally unique, so add a random number to the end.
iotHubName=ContosoTestHub$RANDOM
echo "IoT hub name = " $iotHubName

# Create the IoT hub in the Free tier.
az iot hub create --name $iotHubName \
    --resource-group $resourceGroup \
    --sku F1 --location $location

# The storage account name must be globally unique, so add a random number to the end.
storageAccountName=contosostoragemon$RANDOM
echo "Storage account name = " $storageAccountName

# Create the storage account.
az storage account create --name $storageAccountName \
    --resource-group $resourceGroup \
    --location $location \
    --sku Standard_LRS

# Create the IoT device identity to be used for testing.
az iot hub device-identity create --device-id $iotDeviceName \
    --hub-name $iotHubName 

# Retrieve the information about the device identity, then copy the primary key to
#   Notepad. You need this to run the device simulation during the testing phase.
az iot hub device-identity show --device-id $iotDeviceName \
    --hub-name $iotHubName

备注

创建设备标识时,可能会出现以下错误:找不到 IoT 中心 ContosoTestHub 的策略 iothubowner 的密钥。 若要修正此错误,请更新 Azure CLI IoT 扩展,然后再次运行脚本中的最后两个命令。To fix this error, update the Azure CLI IoT Extension and then run the last two commands in the script again.

下面是用于更新扩展的命令。Here is the command to update the extension.

az extension update --name azure-iot

启用诊断日志Enable the diagnostic logs

创建新的 IoT 中心时,会默认禁用诊断日志Diagnostic logs are disabled by default when you create a new IoT hub. 在此部分,请为中心启用诊断日志。In this section, enable the diagnostic logs for your hub.

  1. 首先,如果尚未进入门户的中心,请单击“资源组”,然后单击资源组 Contoso-Resources 。First, if you're not already on your hub in the portal, click Resource groups and click on the resource group Contoso-Resources. 从显示的资源列表中选择中心。Select the hub from the list of resources displayed.

  2. 找到 IoT 中心边栏选项卡中的“监视”部分。 Look for the Monitoring section in the IoT Hub blade. 单击“诊断设置”。 Click Diagnostic settings.

    显示 IoT 中心边栏选项卡诊断设置部分的屏幕截图。

  3. 确保订阅和资源组正确。Make sure the subscription and resource group are correct. 在“资源类型”下取消选中“全选”,然后查找并勾选 IoT 中心”。 Under Resource Type, uncheck Select All, then look for and check IoT Hub. (“全选”旁边会再次出现一个对勾,忽略它即可。) 在“资源”下,选择中心名称。 (It puts the checkmark next to Select All again, just ignore it.) Under Resource, select the hub name. 屏幕应如下图所示:Your screen should look like this image:

    显示 IoT 中心边栏选项卡诊断设置部分的屏幕截图。

  4. 现在单击“启用诊断”。 Now click Turn on diagnostics. 此时会显示“诊断设置”窗格。The Diagnostics settings pane is displayed. 将诊断日志设置的名称指定为“diags-hub”。Specify the name of your diagnostic logs settings as "diags-hub".

  5. 选中“存档到存储帐户”。 Check Archive to a storage account.

    显示如何将诊断设置为存档到存储帐户的屏幕截图。

    单击“配置”,此时会看到“选择存储帐户”屏幕,选择右侧的帐户 (contosostoragemon),然后单击“确定”,返回到“诊断设置”窗格。 Click Configure to see the Select a storage account screen, select the right one (contosostoragemon), and click OK to return to the Diagnostics settings pane.

    屏幕截图,显示如何将诊断日志设置为存档到存储帐户。

  6. 在“日志”下,勾选“连接”和“设备遥测”,然后将“保留期(天)”设置为每项 7 天。 Under LOG, check Connections and Device Telemetry, and set the Retention (days) to 7 days for each. “诊断设置”屏幕现在应如下图所示:Your Diagnostic settings screen should now look like this image:

    显示最终诊断日志设置的屏幕截图。

  7. 单击“保存” 保存这些设置。Click Save to save the settings. 关闭“诊断设置”窗格。Close the Diagnostics settings pane.

随后在查看诊断日志时,就可以看到设备的连接和断开连接日志记录。Later, when you look at the diagnostic logs, you'll be able to see the connect and disconnect logging for the device.

设置指标Set up metrics

现在,设置一些将消息发送到中心时需要监视的指标。Now set up some metrics to watch for when messages are sent to the hub.

  1. 在 IoT 中心的设置窗格中,单击“监视”部分中的“指标”选项。 In the settings pane for the IoT hub, click on the Metrics option in the Monitoring section.

  2. 在屏幕顶部,单击“过去 24 小时(自动)” 。At the top of the screen, click Last 24 hours (Automatic). 在显示的下拉列表中,选择“过去 4 小时”作为“时间范围”,并将“时间粒度”设置为“1 分钟”(本地时间)。 In the dropdown that appears, select Last 4 hours for Time Range, and set Time Granularity to 1 minute, local time. 单击“应用” 保存这些设置。Click Apply to save these settings.

    显示指标时间设置的屏幕截图。

  3. 默认有一个指标条目。There is one metric entry by default. 让资源组和指标命名空间保留默认值。Leave the resource group as the default, and the metric namespace. 在“指标”下拉列表中,选择“发送的遥测消息数” 。In the Metric dropdown list, select Telemetry messages sent. 将“聚合”设置为“总和”。 Set Aggregation to Sum.

    显示如何为发送的遥测消息添加指标的屏幕截图。

  4. 现在单击“添加指标”,向图表添加另一个指标。 Now click Add metric to add another metric to the chart. 选择资源组 (ContosoTestHub)。Select your resource group (ContosoTestHub). 在“指标”下,选择“已使用的消息总数”。 Under Metric, select Total number of messages used. 对于“聚合”,请选择“平均”。 For Aggregation, select Avg.

    现在,屏幕会显示针对“发送的遥测消息数”的最小化指标,以及针对“已使用的消息总数”的新指标。 Now your screen shows the minimized metric for Telemetry messages sent, plus the new metric for Total number of messages used.

    显示如何为发送的遥测消息添加指标的屏幕截图。

    单击“固定到仪表板” 。Click Pin to dashboard. 这样就会将它固定到 Azure 门户的仪表板上,方便再次访问。It will pin it to the dashboard of your Azure portal so you can access it again. 如果不将它固定到仪表板,则不会保留设置。If you don't pin it to the dashboard, your settings are not retained.

设置警报Set up alerts

转到门户中的中心。Go to the hub in the portal. 单击“资源组”,选择“ContosoResources”,然后选择 IoT 中心“ContosoTestHub”。 Click Resource Groups, select ContosoResources, then select IoT Hub ContosoTestHub.

IoT 中心尚未迁移到 Azure Monitor 中的指标;必须使用经典警报IoT Hub has not been migrated to the metrics in Azure Monitor yet; you have to use classic alerts.

  1. 在“监视”下,单击“警报”。此时会显示警报主屏幕。 Under Monitoring, click Alerts This shows the main alert screen.

    显示如何查找经典警报的屏幕截图。

  2. 若要从此处转到经典警报,请单击“查看经典警报”。 To get to the classic alerts from here, click View classic alerts.

    显示经典警报屏幕的屏幕截图。

    填写字段:Fill in the fields:

    订阅:将此字段保留设置为当前订阅。Subscription: Leave this field set to your current subscription.

    :将此字段设置为“指标”。 Source: Set this field to Metrics.

    资源组:将此字段设置为当前资源组“ContosoResources”。 Resource group: Set this field to your current resource group, ContosoResources.

    资源类型:将此字段设置为“IoT 中心”。Resource type: Set this field to IoT Hub.

    资源:选择 IoT 中心“ContosoTestHub”。 Resource: Select your IoT hub, ContosoTestHub.

  3. 单击“添加指标警报(经典)”以设置新警报。 Click Add metric alert (classic) to set up a new alert.

    填写字段:Fill in the fields:

    名称:为警报规则提供名称,例如 telemetry-messagesName: Provide a name for your alert rule, such as telemetry-messages.

    说明:提供警报说明,例如“发送了 1000 个遥测消息时的警报”。 Description: Provide a description of your alert, such as alert when there are 1000 telemetry messages sent.

    :将此项设置为“指标”。 Source: Set this to Metrics.

    “订阅”、“资源组”和“资源”应该设置为在“查看经典警报”屏幕上选择的值。 Subscription, Resource group, and Resource should be set to the values you selected on the View classic alerts screen.

    将“指标”设置为“发送的遥测消息数”。 Set Metric to Telemetry messages sent.

    显示如何为发送的遥测消息设置经典警报的屏幕截图。

  4. 在图表后设置以下字段:After the chart, set the following fields:

    条件:设置为“大于”。 Condition: Set to Greater than.

    阈值:设置为 1000。Threshold: Set to 1000.

    时间段:设置为“过去 5 分钟”。 Period: Set to Over the last 5 minutes.

    通知电子邮件收件人:将电子邮件地址置于此处。Notification email recipients: Put your e-mail address here.

    显示警报屏幕下半部分的屏幕截图。

    单击“确定”保存警报。 Click OK to save the alert.

  5. 现在针对“已使用的消息总数”设置另一警报。 Now set up another alert for the Total number of messages used. 当使用的消息数接近 IoT 中心的配额时,如果需要系统发送警报来通知你该中心很快就会开始拒绝消息,则可使用此指标。This metric is useful if you want to send an alert when the number of messages used is approaching the quota for the IoT hub -- to let you know the hub will soon start rejecting messages.

    在“查看经典警报”屏幕上,单击“添加指标警报(经典)”,然后在“添加规则”窗格上填充以下字段。 On the View classic alerts screen, click Add metric alert (classic), then fill in these fields on the Add rule pane.

    名称:为警报规则提供名称,例如 number-of-messages-usedName: Provide a name for your alert rule, such as number-of-messages-used.

    说明:提供警报说明,例如“接近配额时的警报”。 Description: Provide a description of your alert, such as alert when getting close to quota.

    :将此字段设置为“指标”。 Source: Set this field to Metrics.

    “订阅”、“资源组”和“资源”应该设置为在“查看经典警报”屏幕上选择的值。 Subscription, Resource group, and Resource should be set to the values you selected on the View classic alerts screen.

    将“指标”设置为“已使用的消息总数”。 Set Metric to Total number of messages used.

  6. 在图表下填充以下字段:Under the chart, fill in the following fields:

    条件:设置为“大于”。 Condition: Set to Greater than.

    阈值:设置为 1000。Threshold: Set to 1000.

    时间段:将此字段设置为“过去 5 分钟”。 Period: Set this field to Over the last 5 minutes.

    通知电子邮件收件人:将电子邮件地址置于此处。Notification email recipients: Put your e-mail address here.

    单击“确定” 保存规则。Click OK to save the rule.

  7. 现在会在经典警报窗格中看到两个警报:You should now see two alerts in the classic alerts pane:

    显示包含新警报规则的经典警报屏幕的屏幕截图。

  8. 关闭警报窗格。Close the alerts pane.

    使用这些设置时,如果发送的消息数大于 400 且已使用的消息总数超出 NUMBER,则会出现警报。With these settings, you will get an alert when the number of messages sent is greater than 400 and when the total number of messages used exceeds NUMBER.

运行模拟设备应用Run Simulated Device app

之前的脚本设置部分中,已设置了一个使用 IoT 设备进行模拟的设备。Earlier in the script setup section, you set up a device to simulate using an IoT device. 本部分将下载一个 .NET 控制台应用,用于模拟向 IoT 中心发送设备到云消息的设备。In this section, you download a .NET console app that simulates a device that sends device-to-cloud messages to an IoT hub.

下载 IoT 设备模拟的解决方案。Download the solution for the IoT Device Simulation. 可以通过此链接下载一个包含多个应用程序的存储库;要查找的解决方案位于 iot-hub/Tutorials/Routing/ 中。This link downloads a repo with several applications in it; the solution you are looking for is in iot-hub/Tutorials/Routing/.

双击解决方案文件 (SimulatedDevice.sln),在 Visual Studio 中打开代码,然后打开 Program.cs。Double-click on the solution file (SimulatedDevice.sln) to open the code in Visual Studio, then open Program.cs. 使用 IoT 中心主机名代替 {iot hub hostname}Substitute {iot hub hostname} with the IoT hub host name. IoT 中心主机名的格式为“{iot-hub-name}.azure-devices.cn” 。The format of the IoT hub host name is {iot-hub-name}.azure-devices.cn. 本教程的中心主机名为“ContosoTestHub.azure-devices.cn” 。For this tutorial, the hub host name is ContosoTestHub.azure-devices.cn. 接下来,使用之前设置模拟设备时保存的设备密钥代替 {device key}Next, substitute {device key} with the device key you saved earlier when setting up the simulated device.

     static string myDeviceId = "contoso-test-device";
     static string iotHubUri = "ContosoTestHub.azure-devices.cn";
     // This is the primary key for the device. This is in the portal. 
     // Find your IoT hub in the portal > IoT devices > select your device > copy the key. 
     static string deviceKey = "{your device key here}";

运行和测试Run and test

在 Program.cs 中将 Task.Delay 从 1000 更改为 10,这样就会将发送消息的间隔时间从 1 秒更改为 0.01 秒。In Program.cs, change the Task.Delay from 1000 to 10, which reduces the amount of time between sending messages from 1 second to .01 seconds. 缩短此延迟会增加发送的消息数。Shortening this delay increases the number of messages sent.

await Task.Delay(10);

运行控制台应用程序。Run the console application. 稍等几分钟(10-15 分钟)。Wait a few minutes (10-15). 可在应用程序的控制台屏幕上看到消息从模拟设备发送到中心。You can see the messages being sent from the simulated device to the hub on the console screen of the application.

在门户中查看指标See the metrics in the portal

从“仪表板”打开指标。Open your metrics from the Dashboard. 在时间粒度为“1 分钟”的情况下,将时间值更改为“过去 30 分钟”。 Change the time values to Last 30 minutes with a time granularity of 1 minute. 此时会显示发送的遥测消息数以及在图表上使用的总消息数,最新的数字位于图表底部。It shows the telemetry messages sent and the total number of messages used on the chart, with the most recent numbers at the bottom of the chart.

显示指标的屏幕截图。

查看警报See the alerts

返回到警报。Go back to alerts. 单击“资源组”,选择“ContosoResources”,然后选择中心“ContosoTestHub”。 Click Resource groups, select ContosoResources, then select the hub ContosoTestHub. 在为中心显示的属性页中,选择“警报”,然后选择“查看经典警报”。 In the properties page displayed for the hub, select Alerts, then View classic alerts.

当发送的消息数超出限制时,你会开始收到电子邮件警报。When the number of messages sent exceeds the limit, you start getting e-mail alerts. 若要查看是否有任何活动警报,请转到中心,然后选择“警报”。 To see if there are any active alerts, go to your hub and select Alerts. 此时会显示处于活动状态的警报,以及是否存在任何警告。It will show you the alerts that are active, and if there are any warnings.

显示已触发警报的屏幕截图。

单击遥测消息的警报。Click on the alert for telemetry messages. 此时会显示指标结果以及包含结果的图表。It shows the metric result and a chart with the results. 另外,发送的用于警告警报已触发的电子邮件如下图所示:Also, the e-mail sent to warn you of the alert firing looks like this image:

电子邮件的屏幕截图,显示警报已触发。

查看诊断日志See the diagnostic logs

设置诊断日志,以便将其导出到 Blob 存储。You set up your diagnostic logs to be exported to blob storage. 转到资源组,选择存储帐户 contosostoragemonGo to your resource group and select your storage account contosostoragemon. 选择“Blob”,然后打开容器 insights-logs-connectionsSelect Blobs, then open container insights-logs-connections. 向下钻取到当前日期,然后选择最近的文件。Drill down until you get to the current date and select the most recent file.

屏幕截图,显示如何向下钻取到存储容器来查看诊断日志。

单击“下载” 以下载该文件,然后将其打开。Click Download to download it and open it. 此时会看到日志,描述设备在将消息发送到中心时进行的连接和断开连接操作。You see the logs of the device connecting and disconnecting as it sends messages to the hub. 下面是一个示例:Here a sample:

{ 
  "time": "2018-12-17T18:11:25Z", 
  "resourceId": 
    "/SUBSCRIPTIONS/your-subscription-id/RESOURCEGROUPS/CONTOSORESOURCES/PROVIDERS/MICROSOFT.DEVICES/IOTHUBS/CONTOSOTESTHUB", 
  "operationName": "deviceConnect", 
  "category": "Connections", 
  "level": "Information", 
  "properties": 
      {"deviceId":"Contoso-Test-Device",
       "protocol":"Mqtt",
       "authType":null,
       "maskedIpAddress":"73.162.215.XXX",
       "statusCode":null
       }, 
  "location": "chinaeast"
}
{ 
   "time": "2018-12-17T18:19:25Z", 
   "resourceId": 
     "/SUBSCRIPTIONS/your-subscription-id/RESOURCEGROUPS/CONTOSORESOURCES/PROVIDERS/MICROSOFT.DEVICES/IOTHUBS/CONTOSOTESTHUB", 
    "operationName": "deviceDisconnect", 
    "category": "Connections", 
    "level": "Error", 
    "resultType": "404104", 
    "resultDescription": "DeviceConnectionClosedRemotely", 
    "properties": 
        {"deviceId":"Contoso-Test-Device",
         "protocol":"Mqtt",
         "authType":null,
         "maskedIpAddress":"73.162.215.XXX",
         "statusCode":"404"
         }, 
    "location": "chinaeast"
}

清理资源Clean up resources

若要删除在本教程中创建的所有资源,请删除资源组。To remove all of the resources you've created in this tutorial, delete the resource group. 此操作会一并删除组中包含的所有资源。This action deletes all resources contained within the group. 本例删除 IoT 中心、存储帐户和资源组本身。In this case, it removes the IoT hub, the storage account, and the resource group itself. 如果已将指标固定到仪表板,则需要手动删除这些指标,方法是:单击每个指标右上角的三个点,然后选择“删除”。 If you have pinned metrics to the dashboard, you will have to remove those manually by clicking on the three dots in the upper right-hand corner of each and selecting Remove.

若要删除资源组,请使用 az group delete 命令。To remove the resource group, use the az group delete command.

az group delete --name $resourceGroup

后续步骤Next steps

本教程介绍了如何执行以下任务,以便使用指标和诊断日志:In this tutorial, you learned how to use metrics and diagnostic logs by performing the following tasks:

  • 使用 Azure CLI 创建 IoT 中心、模拟设备和存储帐户。Using Azure CLI, create an IoT hub, a simulated device, and a storage account.
  • 启用诊断日志。Enable diagnostic logs.
  • 启用指标。Enable metrics.
  • 针对这些指标设置警报。Set up alerts for those metrics.
  • 下载并运行一个应用,用于模拟 IoT 设备将消息发送到中心的情形。Download and run an app that simulates an IoT device sending messages to the hub.
  • 运行该应用,直至警报开始触发。Run the app until the alerts begin to fire.
  • 查看指标结果并检查诊断日志。View the metrics results and check the diagnostic logs.

转到下一教程,了解如何管理 IoT 设备的状态。Advance to the next tutorial to learn how to manage the state of an IoT device.