对 IoT Edge 上的实时视频分析进行故障排除Troubleshoot Live Video Analytics on IoT Edge

本文介绍对 Azure IoT Edge 上的实时视频分析 (LVA) 进行故障排除的步骤。This article covers troubleshooting steps for Live Video Analytics (LVA) on Azure IoT Edge.

排查部署问题Troubleshoot deployment issues

诊断Diagnostics

在部署实时视频分析的过程中,需设置 Azure 资源,如 IoT 中心和 IoT Edge 设备。As part of your Live Video Analytics deployment, you set up Azure resources such as IoT Hub and IoT Edge devices. 作为诊断问题的第一步,请务必确保按照以下说明正确设置 Edge 设备:As a first step to diagnosing problems, always ensure that the Edge device is properly set up by following these instructions:

  1. 运行 check 命令Run the check command.
  2. 检查 IoT Edge 版本Check your IoT Edge version.
  3. 检查 IoT Edge 安全管理器的状态及其日志Check the status of the IoT Edge security manager and its logs.
  4. 查看经过 IoT Edge 中心的消息View the messages that are going through the IoT Edge hub.
  5. 重启容器Restart containers.
  6. 检查防火墙和端口配置规则Check your firewall and port configuration rules.

部署前的问题Pre-deployment issues

如果边缘基础结构正常,则可以查找部署清单文件的问题。If the edge infrastructure is fine, you can look for issues with the deployment manifest file. 若要在 IoT Edge 设备上与任何其他 IoT 模块一起部署 IoT Edge 模块上的实时视频分析,请使用包含 IoT Edge 中心、IoT Edge 代理和其他模块及其属性的部署清单。To deploy the Live Video Analytics on IoT Edge module on the IoT Edge device alongside any other IoT modules, you use a deployment manifest that contains the IoT Edge hub, IoT Edge agent, and other modules and their properties. 可以使用以下命令部署清单文件:You can use the following command to deploy the manifest file:

az iot edge set-modules --hub-name <iot-hub-name> --device-id lva-sample-device --content <path-to-deployment_manifest.json>

如果 JSON 代码格式不正确,则可能会收到以下错误:If the JSON code isn't well formed, you might receive the following error:
   未能分析参数“content”的文件“”中的 JSON,出现异常:“额外数据:第 101 行第 1 列 (char 5325)”    Failed to parse JSON from file: '' for argument 'content' with exception: "Extra data: line 101 column 1 (char 5325)"

如果遇到此错误,建议检查 JSON 中是否缺少括号或存在文件结构的其他问题。If you encounter this error, we recommend that you check the JSON for missing brackets or other issues with the structure of the file. 可以使用客户端(如 Notepad++ 与 JSON Viewer 插件)或联机工具(如 JSON Formatter & Validator)来验证文件结构。To validate the file structure, you can use a client such as the Notepad++ with JSON Viewer plug-in or an online tool such as the JSON Formatter & Validator.

在部署期间:用媒体图直接方法诊断During deployment: Diagnose with media graph direct methods

在 IoT Edge 设备上正确部署 IoT Edge 模块上的实时视频分析后,可以通过调用直接方法来创建和运行媒体图。After the Live Video Analytics on IoT Edge module is deployed correctly on the IoT Edge device, you can create and run the media graph by invoking direct methods.

备注

直接方法调用只应对 lvaEdge 模块使用。The direct method calls should be made to the lvaEdge module only.

可以使用 Azure 门户通过直接方法来运行媒体图诊断:You can use the Azure portal to run a diagnosis of the media graph using direct methods:

  1. 在 Azure 门户中,转到连接到 IoT Edge 设备的 IoT 中心。In the Azure portal, go to the IoT hub that's connected to your IoT Edge device.

  2. 查找“自动设备管理”,然后选择“IoT Edge” 。Look for Automatic device management, and then select IoT Edge.

  3. 在 Edge 设备列表中,选择要诊断的设备。In the list of Edge devices, select the device that you want to diagnose.

    显示 Edge 设备列表的 Azure 门户的屏幕截图

  4. 检查响应代码是否为 200-OK。Check to see whether the response code is 200-OK. IoT Edge 运行时的其他响应代码包括:Other response codes for the IoT Edge runtime include:

    • 400 - 部署配置格式不正确或无效。400 - The deployment configuration is malformed or invalid.
    • 417 - 没有为设备设置部署配置。417 - The device doesn't have a deployment configuration set.
    • 412 - 部署配置中的架构版本无效。412 - The schema version in the deployment configuration is invalid.
    • 406 - IoT Edge 设备脱机或不发送状态报告。406 - The IoT Edge device is offline or not sending status reports.
    • 500 - IoT Edge 运行时中出现了一个错误。500 - An error occurred in the IoT Edge runtime.

    提示

    如果在环境中运行 Azure IoT Edge 模块时遇到问题,请使用 Azure IoT Edge 标准诊断步骤作为故障排除和诊断的指南。If you experience issues running Azure IoT Edge modules in your environment, use Azure IoT Edge standard diagnostic steps as a guide for troubleshooting and diagnostics.

后期部署:直接方法错误代码Post deployment: Direct method error code

  1. 如果收到状态 501 code,请检查以确保直接方法名称正确。If you get a status 501 code, check to ensure that the direct method name is accurate. 如果方法名称和请求有效负载准确,则应获得结果,并显示成功代码 =200。If the method name and request payload are accurate, you should get results along with success code =200.

  2. 如果请求有效负载不准确,将显示状态 400 code 以及指示错误代码和消息的响应有效负载,这些错误代码和消息应该有助于诊断直接方法调用的问题。If the request payload is inaccurate, you will get a status 400 code and a response payload that indicates error code and message that should help with diagnosing the issue with your direct method call.

    • 检查报告的属性和所需属性有助于了解模块属性是否已与部署同步。Checking on reported and desired properties can help you understand whether the module properties have synced with the deployment. 如果没有,可以重启 IoT Edge 设备。If they haven't, you can restart your IoT Edge device.

    • 使用直接方法指南来调用一些方法,尤其是一些简单的方法,如 GraphTopologyList。Use the Direct methods guide to call a few methods, especially simple ones such as GraphTopologyList. 本指南还指定了所需的请求和响应有效负载以及错误代码。The guide also specifies expected request and response payloads and error codes. 简单的直接方法成功后,可以确保实时视频分析 IoT Edge 模块功能正常。After the simple direct methods are successful, you can be assured that the Live Video Analytics IoT Edge module is functionally OK.

      IoT Edge 模块的“直接方法”窗格的屏幕截图。

  3. 如果“在部署中指定”和“由设备报告”列指示“是”,则可以针对 IoT Edge 模块上的实时视频分析调用直接方法 。If the Specified in deployment and Reported by device columns indicate Yes, you can invoke direct methods on the Live Video Analytics on IoT Edge module. 选择该模块后,会转到一个页面,可以在其中查看所需的属性和报告的属性,并可以调用直接方法。Select the module to go to a page where you can check the desired and reported properties and invoke direct methods. 请记住以下几点:Keep in mind the following:

后期部署:在运行期间诊断日志中的问题Post deployment: Diagnose logs for issues during the run

IoT Edge 模块的容器日志应包含诊断信息,以帮助调试模块运行时的问题。The container logs for your IoT Edge module should contain diagnostics information to help debug your issues during module runtime. 可以检查容器日志中的问题,并对问题进行自我诊断。You can check container logs for issues and self-diagnose the issue.

如果你已经运行了前面的所有检查,但仍然遇到问题,请使用 support bundle 命令从 IoT Edge 设备收集日志,以供 Azure 团队进一步分析。If you've run all the preceding checks and are still encountering issues, gather logs from the IoT Edge device with the support bundle command for further analysis by the Azure team. 可以与我们联系以获取支持,并提交收集的日志。You can contact us for support and to submit the collected logs.

常见错误解决方法Common error resolutions

实时视频分析作为 IoT Edge 模块部署到 Edge 设备上,并且与 IoT Edge 代理和中心模块协作。Live Video Analytics is deployed as an IoT Edge module on the IoT Edge device, and it works collaboratively with the IoT Edge agent and hub modules. 在部署实时视频分析时会遇到的一些常见错误是由底层 IoT 基础结构的问题导致的。Some of the common errors that you'll encounter with the Live Video Analytics deployment are caused by issues with the underlying IoT infrastructure. 这些错误包括:The errors include:

用于外部模块的实时视频分析Live Video Analytics working with external modules

通过媒体图扩展处理器的实时视频分析可以扩展媒体图,以使用 HTTP 或 gRPC 协议发送和接收来自其他 IoT Edge 模块的数据。Live Video Analytics via the media graph extension processors can extend the media graph to send and receive data from other IoT Edge modules by using HTTP or gRPC protocols. 作为特定示例,媒体图可以通过 HTTP 协议将视频帧作为图像发送到外部推理模块(如 Yolo v3),并接收基于 JSON 的分析结果。As a specific example, this media graph can send video frames as images to an external inference module such as Yolo v3 and receive JSON-based analytics results using HTTP protocol . 在这种拓扑中,事件的目标主要是 IoT 中心。In such a topology, the destination for the events is mostly the IoT hub. 如果在中心上看不到推理事件,请检查以下各项:In situations where you don't see the inference events on the hub, check for the following:

  • 检查媒体图要发布到的中心是否与要检查的中心相同。Check to see whether the hub that media graph is publishing to and the hub you're examining are the same. 创建多个部署时,最终可能会获得多个中心,并可能会错误地检查错误的事件中心。As you create multiple deployments, you might end up with multiple hubs and mistakenly check the wrong hub for events.

  • 在 Azure 门户中,检查外部模块是否已部署并正在运行。In Azure portal, check to see whether the external module is deployed and running. 在此处的示例图中,rtspsim、yolov3、tinyyolov3 和 logAnalyticsAgent 是在 lvaEdge 模块外部运行的 IoT Edge 模块。In the example image here, rtspsim, yolov3, tinyyolov3 and logAnalyticsAgent are IoT Edge modules running external to the lvaEdge module.

    显示 Azure IoT 中心内模块运行状态的屏幕截图。 Screenshot that displays the running status of modules in Azure IoT Hub.

  • 检查是否要将事件发送到正确的 URL 终结点。Check to see whether you're sending events to the correct URL endpoint. 外部 AI 容器公开一个 URL 和一个端口,通过该端口接收并返回 POST 请求中的数据。The external AI container exposes a URL and a port through which it receives and returns the data from POST requests. 此 URL 被指定为 HTTP 扩展处理器的 endpoint: url 属性。This URL is specified as an endpoint: url property for the HTTP extension processor. 拓扑 URL 中所示,终结点设置为推理 URL 参数。As seen in the topology URL, the endpoint is set to the inferencing URL parameter. 请确保参数的默认值或传入的值是准确的。Ensure that the default value for the parameter or the passed-in value is accurate. 可以使用客户端 URL (cURL) 来测试它是否正常工作。You can test to see whether it's working by using Client URL (cURL).

    例如,下面是一个在本地计算机上运行的 Yolo v3 容器,其 IP 地址为 172.17.0.3。As an example, here is a Yolo v3 container that's running on local machine with an IP address of 172.17.0.3.

    curl -X POST http://172.17.0.3/score -H "Content-Type: image/jpeg" --data-binary @<fullpath to jpg>
    

    返回的结果:Result returned:

    {"inferences": [{"type": "entity", "entity": {"tag": {"value": "car", "confidence": 0.8668569922447205}, "box": {"l": 0.3853073438008626, "t": 0.6063712999658677, "w": 0.04174524943033854, "h": 0.02989496027381675}}}]}
    

    提示

    使用 Docker 检查命令查找计算机的 IP 地址。Use Docker inspect command to find the IP address of the machine.

  • 如果正在使用媒体图扩展处理器运行图形的一个或多个实例,则应使用 samplingOptions 字段来管理视频源的每秒帧数 (fps)。If you're running one or more instances of a graph that uses the media graph extension processor, you should use the samplingOptions field to manage the frames per second (fps) rate of the video feed.

    • 在某些情况下,边缘计算机的 CPU 或内存使用率很高,可能会丢失某些推理事件。In certain situations, where the CPU or memory of the edge machine is highly utilized, you can lose certain inference events. 若要解决此问题,请在 samplingOptions 字段上将 maximumSamplesPerSecond 属性设置为低值。To address this issue, set a low value for the maximumSamplesPerSecond property on the samplingOptions field. 可以在图的每个实例上将其设置为 0.5 ("maximumSamplesPerSecond":"0.5"),然后重新运行该实例以检查中心上的推理事件。You can set it to 0.5 ("maximumSamplesPerSecond": "0.5") on each instance of the graph and then re-run the instance to check for inference events on the hub.

并行的多个直接方法 - 超时失败Multiple direct methods in parallel – timeout failure

IoT Edge 上的实时视频分析提供了一种基于直接方法的编程模型,该模型支持设置多个拓扑和多个图形实例。Live Video Analytics on IoT Edge provides a direct method-based programming model that allows you to set up multiple topologies and multiple graph instances. 在拓扑和图形设置的过程中,可在 IoT Edge 模块上调用多个直接方法调用。As part of the topology and graph setup, you invoke multiple direct method calls on the IoT Edge module. 如果并行调用了多个方法调用,特别是启动和停止图形的方法调用,则可能会遇到一些超时故障,如下面所示:If you invoke these multiple method calls in parallel, especially the ones that start and stop the graphs, you might experience a timeout failure such as the following:

程序集初始化方法 Microsoft.Media.LiveVideoAnalytics.Test.Feature.Edge.AssemblyInitializer.InitializeAssemblyAsync 引发了异常。Assembly Initialization method Microsoft.Media.LiveVideoAnalytics.Test.Feature.Edge.AssemblyInitializer.InitializeAssemblyAsync threw exception. Microsoft.Azure.Devices.Common.Exceptions.IotHubException:Microsoft.Azure.Devices.Common.Exceptions.IotHubException:Microsoft.Azure.Devices.Common.Exceptions.IotHubException: Microsoft.Azure.Devices.Common.Exceptions.IotHubException:
{"Message":"{\"errorCode\":504101,\"trackingId\":\"55b1d7845498428593c2738d94442607-G:32-TimeStamp:05/15/2020 20:43:10-G:10-TimeStamp:05/15/2020 20:43:10\",\"message\":\"Timed out waiting for the response from device.\",\"info\":{},\"timestampUtc\":\"2020-05-15T20:43:10.3899553Z\"}","ExceptionMessage":""}. Aborting test execution.

建议不要并行调用直接方法。We recommend that you not call direct methods in parallel. 按顺序调用它们(也就是说,仅在前一个直接方法调用完成之后才进行另一个直接方法调用)。Call them sequentially (that is, make one direct method call only after the previous one is finished).

收集日志以提交支持票证Collect logs for submitting a support ticket

如果自助式故障排除步骤无法解决问题,请转到 Azure 门户,然后打开支持票证When self-guided troubleshooting steps don't resolve your problem, go the Azure portal and open a support ticket.

警告

这些日志可能包含个人身份信息 (PII),例如 IP 地址。The logs may contain personally identifiable information (PII) such as your IP address. 一旦检查完日志的所有本地副本并关闭支持票证,就会立即删除这些副本。All local copies of the logs will be deleted as soon as we complete examining them and close the support ticket.

若要收集应添加到票证的相关日志,请遵循以下说明进行操作,并将日志文件上传到支持请求的“详细信息”窗格中。To gather the relevant logs that should be added to the ticket, follow the instructions below in order and upload the log files in the Details pane of the support request.

  1. 配置实时视频分析模块以收集详细日志Configure the Live Video Analytics module to collect Verbose Logs
  2. 启用调试日志Turn on Debug Logs
  3. 再现问题Reproduce the issue
  4. 从门户中的“IoT 中心”页连接到虚拟机Connect to the virtual machine from the IoT Hub page in the portal
    1. 压缩 debugLogs 文件夹中的所有文件。Zip all the files in the debugLogs folder.

      备注

      这些日志文件不用于自我诊断。These log files are not meant for self-diagnosis. 它们是供 Azure 工程团队使用以分析你的问题。They are meant for the Azure engineering team to analyze your issues.

      • 在以下命令中,请务必将 $DEBUG_LOG_LOCATION_ON_EDGE_DEVICE 替换为之前在步骤 2 中设置的边缘设备上的调试日志的位置 。In the following command, be sure to replace $DEBUG_LOG_LOCATION_ON_EDGE_DEVICE with the location of the debug logs on the Edge device that you set up earlier in Step 2.

        sudo apt install zip unzip  
        zip -r debugLogs.zip $DEBUG_LOG_LOCATION_ON_EDGE_DEVICE 
        
    2. 将 debugLogs.zip 文件附加到支持票证。Attach the debugLogs.zip file to the support ticket.

  5. 运行 support bundle 命令,收集日志并附加到支持票证。Run the support bundle command, collect the logs and attach to the support ticket.

配置实时视频分析模块以收集详细日志Configure Live Video Analytics module to collect Verbose Logs

配置实时视频分析模块,通过设置 logLevellogCategories 来收集详细日志,如下所示:Configure your Live Video Analytics module to collect Verbose logs by setting the logLevel and logCategories as follows:

"logLevel": "Verbose",
"logCategories": "Application,Events,MediaPipeline",

可通过两种方式实现此目的:You can do this in either:

  • 在 Azure 门户中,更新实时视频分析模块的模块标识孪生属性 模块标识孪生属性。 In Azure portal, by updating the Module Identity Twin properties of the Live Video Analytics module Module Identity Twin Properies.
  • 或者可以在部署清单文件中,在实时视频分析模块的属性节点中添加这些项Or in your deployment manifest file, you can add these entries in the properties node of the Live Video Analytics module

使用 support-bundle 命令Use the support-bundle command

需要从 IoT Edge 设备收集日志时,最简单的方法是使用 support-bundle 命令。When you need to gather logs from an IoT Edge device, the easiest way is to use the support-bundle command. 此命令收集:This command collects:

  • 模块日志Module logs
  • IoT Edge 安全管理器和容器引擎日志IoT Edge security manager and container engine logs
  • IoT Edge 检查 JSON 输出IoT Edge check JSON output
  • 有用的调试信息Useful debug information
  1. 运行带有 --since 标志的 support-bundle 命令以指定你希望日志覆盖的时间。Run the support-bundle command with the --since flag to specify how much time you want your logs to cover. 例如,指定为 2h 将获得最近两个小时的日志。For example, 2h will get logs for the last two hours. 可以更改此标志的值,以包含不同时间段的日志。You can change the value of this flag to include logs for different periods.

    sudo iotedge support-bundle --since 2h
    

    此命令将在运行命令的目录中创建一个名为 support_bundle.zip 的文件。This command creates a file named support_bundle.zip in the directory where you ran the command.

  2. 将 support_bundle.zip 文件附加到支持票证。Attach the support_bundle.zip file to the support ticket.

实时视频分析调试日志Live Video Analytics debug logs

若要在 IoT Edge 模块上配置实时视频分析以生成调试日志,请执行以下操作:To configure the Live Video Analytics on IoT Edge module to generate debug logs, do the following:

  1. 登录到 Azure 门户,并转到 IoT 中心。Sign in to the Azure portal, and go to your IoT hub.

  2. 在左窗格中,选择“IoT Edge”。On the left pane, select IoT Edge.

  3. 在设备列表中,选择目标设备的 ID。In the list of devices, select the ID of the target device.

  4. 在窗格顶部,选择“设置模块”。At the top of the pane, select Set Modules.

    Azure 门户中“设置模块”按钮的屏幕截图。

  5. 在“IoT Edge 模块”部分中,查找并选择“lvaEdge” 。In the IoT Edge Modules section, look for and select lvaEdge.

  6. 选择“容器创建选项”。Select Container Create Options.

  7. 在“绑定”部分中,添加以下命令:In the Binds section, add the following command:

    /var/local/mediaservices/logs:/var/lib/azuremediaservices/logs

    备注

    此命令将绑定边缘设备和容器之间的日志文件夹。This command binds the logs folders between the Edge device and the container. 如果要在其他位置收集日志,请使用以下命令,将 $LOG_LOCATION_ON_EDGE_DEVICE 替换为想要使用的位置:/var/$LOG_LOCATION_ON_EDGE_DEVICE:/var/lib/azuremediaservices/logsIf you want to collect the logs in a different location, use the following command, replacing $LOG_LOCATION_ON_EDGE_DEVICE with the location you want to use: /var/$LOG_LOCATION_ON_EDGE_DEVICE:/var/lib/azuremediaservices/logs

  8. 选择“更新”。Select Update.

  9. 选择“查看 + 创建” 。Select Review + Create. 成功的验证消息将发布在绿色横幅下。A successful validation message is posted under a green banner.

  10. 选择“创建”。Select Create.

  11. 更新“模块标识孪生”,以指向 DebugLogsDirectory 参数,该参数指向收集日志的目录:Update Module Identity Twin to point to the DebugLogsDirectory parameter, which points to the directory in which the logs are collected:

    a.a. 在“模块”表下,选择“lvaEdge” 。Under the Modules table, select lvaEdge.
    b.b. 在窗格顶部,选择“模块标识孪生”。At the top of the pane, select Module Identity Twin. 此时将打开一个可编辑窗格。An editable pane opens.
    c.c. 在“所需的密钥”下,添加以下键/值对:Under desired key, add the following key/value pair:
    "DebugLogsDirectory": "/var/lib/azuremediaservices/logs"

    备注

    此命令将绑定边缘设备和容器之间的日志文件夹。This command binds the logs folders between the Edge device and the container. 如果要在设备上的其他位置收集日志:If you want to collect the logs in a different location on the device:

    1. 在“绑定”部分中,为调试日志位置创建一个绑定,将 $DEBUG_LOG_LOCATION_ON_EDGE_DEVICE 和 $DEBUG_LOG_LOCATION 替换为想要使用的位置:/var/$DEBUG_LOG_LOCATION_ON_EDGE_DEVICE:/var/$DEBUG_LOG_LOCATION Create a binding for the Debug Log location in the Binds section, replacing the $DEBUG_LOG_LOCATION_ON_EDGE_DEVICE and $DEBUG_LOG_LOCATION with the location you want: /var/$DEBUG_LOG_LOCATION_ON_EDGE_DEVICE:/var/$DEBUG_LOG_LOCATION
    2. 使用以下命令,将 $DEBUG_LOG_LOCATION 替换为上一步中使用的位置:Use the following command, replacing $DEBUG_LOG_LOCATION with the location used in the previous step:
      "DebugLogsDirectory": "/var/$DEBUG_LOG_LOCATION"

    d.d. 选择“保存”。Select Save.

  12. 可以通过将“模块标识孪生”中的值设置为 NULL 来停止日志收集。You can stop log collection by setting the value in Module Identity Twin to null. 返回“模块标识孪生”页,并将以下参数更新为:Go back to the Module Identity Twin page and update the following parameter as:

    "DebugLogsDirectory": ""

有关日志记录的最佳实践Best practices around logging

监视和日志记录应有助于理解分类以及如何生成日志,以帮助调试 LVA 问题。Monitoring and logging should help in understanding the taxonomy and how to generate logs that will help in debugging issues with LVA.

由于每种语言的 gRPC 服务器实现各不相同,因此没有在服务器内部添加日志记录的标准方法。As gRPC server implementation differ across languages, there is no standard way of adding logging inside in the server.

例如,如果使用 .NET Core 构建 gRPC 服务器,gRPC 服务将在“Grpc”类别下添加日志。As an example, if you build a gRPC server using .NET core, gRPC service adds logs under the Grpc category. 若要启用来自 gRPC 的详细日志,请通过将以下项添加到日志记录中的 LogLevel 子部分,在 appsettings.json 文件中将 Grpc 前缀配置为调试级别:To enable detailed logs from gRPC, configure the Grpc prefixes to the Debug level in your appsettings.json file by adding the following items to the LogLevel sub-section in Logging:

{ 
  "Logging": { 
    "LogLevel": { 
      "Default": "Debug", 
      "System": "Information", 
      "Microsoft": "Information", 
      "Grpc": "Debug" 
       } 
  } 
} 

也可以使用 ConfigureLogging 在 Startup.cs 文件中进行配置:You can also configure this in the Startup.cs file with ConfigureLogging:

public static IHostBuilder CreateHostBuilder(string[] args) => 
    Host.CreateDefaultBuilder(args) 
        .ConfigureLogging(logging => 
        { 

           logging.AddFilter("Grpc", LogLevel.Debug); 
        }) 
        .ConfigureWebHostDefaults(webBuilder => 
        { 
            webBuilder.UseStartup<Startup>(); 
        }); 

.NET 上 gRPC 中的日志记录和诊断为从 gRPC 服务器收集一些诊断日志提供了一些指导。Logging and diagnostics in gRPC on .NET provides some guidance for gathering some diagnostic logs from a gRPC server.

失败的 gRPC 连接A failed gRPC connection

如果图形处于活动状态并正通过照相机进行流式处理,则连接将由实时视频分析进行维护。If a graph is active and streaming from a camera, the connection will be maintained by Live Video Analytics.

监视和平衡 CPU 和 GPU 资源在这些资源成为瓶颈时的负载Monitoring and balancing the load of CPU and GPU resources when these resources become bottlenecks

实时视频分析不监视或不提供任何硬件资源监视。Live Video Analytics does not monitor or provide any hardware resource monitoring. 开发人员必须使用硬件制造商监视解决方案。Developers will have to use the hardware manufacturers monitoring solutions. 但是,如果使用的是 Kubernetes 容器,则可以使用 Kubernetes 仪表板监视设备。However, if you use Kubernetes containers, you can monitor the device using the Kubernetes dashboard.

.NET Core 文档中的 gRPC 还共享有关性能最佳做法负载平衡的一些有价值的信息。gRPC in .NET core documents also share some valuable information on Performance Best Practices and Load balancing.

当推理服务器未收到任何帧,且你收到了“未知”协议错误时,将对推理服务器进行故障排除Troubleshooting an inference server when it does not receive any frames and you are receiving, an "unknown" protocol error

若要获取有关该问题的详细信息,可以执行以下几项操作。There are several things you can do to get more information about the problem.

  • 在实时视频分析模块的所需属性中包含“ediaPipeline”日志类别,并确保日志级别设置为 InformationInclude the "ediaPipeline log category in the desired properties of the Live Video Analytics module and ensure the log level is set to Information.

  • 若要测试网络连接,可以在边缘设备上运行以下命令。To test network connectivity, you can run the following command from the edge device.

    sudo docker exec lvaEdge /bin/bash -c "apt update; apt install -y telnet; telnet <inference-host> <inference-port>" 
    

    如果命令输出的是一串乱七八糟的文本,则 telnet 可以成功打开与推理服务器的连接,并打开二进制 gRPC 通道。If the command outputs a short string of jumbled text, then telnet was successfully able to open a connection to your inference server and open a binary gRPC channel. 如果未看到这种情况,则 telnet 将报告网络错误。If you do not see this, then telnet will report a network error.

  • 在推理服务器中,可以在 gRPC 库中启用其他日志记录。In your inference server you can enable additional logging in the gRPC library. 这可能会提供有关 gRPC 通道本身的其他信息。This can give additional information about the gRPC channel itself. 这种操作因语言而异,一下是 C# 的说明。Doing this varies by language, here are instructions for C#.

从 gRPC 缓冲区中选取更多图像,而不会向第一个缓冲区发送回结果Picking more images from buffer of gRPC without sending back result for first buffer

作为 gRPC 数据传输协定的一部分,应确认实时视频分析发送到 gRPC 推理服务器的所有消息。As a part of the gRPC data transfer contract, all messages that Live Video Analytics sends to the gRPC inferencing server should be acknowledged. 不确认收到图像帧会破坏数据协定,并可能导致意外情况。Not acknowledging the receipt of an image frame breaks the data contract and can result in undesired situations.

若要将 gRPC 服务器与实时视频分析配合使用,可以使用共享内存来获得最佳性能。To use your gRPC server with Live Video Analytics, shared memory can be used for best performance. 此操作要求使用由编程语言/环境公开的 Linux 共享内存功能。This requires you to use Linux shared memory capabilities exposed by the programming language/environment.

  1. 打开 Linux 共享内存句柄。Open the Linux shared memory handle.

  2. 收到帧后,访问共享内存中的地址偏移量。Upon receiving of a frame, access the address offset within the shared memory.

  3. 确认帧处理完成,使实时视频分析可以回收其内存。Acknowledge the frame processing completion so its memory can be reclaimed by Live Video Analytics.

    备注

    如果长时间延迟向实时视频分析确认接收到帧,则可能使共享内存变满并导致数据丢失。If you delay in acknowledging the receipt of the frame to Live Video Analytics for a long time, it can result in the shared memory becoming full and causing data drops.

  4. 将每个帧存储在推理服务器上所选的数据结构(列表、数组等)中。Store each frame in a data structure of your choice (list, array, and so on) on the inferencing server.

  5. 然后,可以在具有所需数量的图像帧时运行处理逻辑。You can then run your processing logic when you have the desired number of image frames.

  6. 准备就绪后,将推理结果返回到实时视频分析。Return the inferencing result back to Live Video Analytics when ready.

后续步骤Next steps

教程:将基于事件的视频录制到云中并从云中播放Tutorial: Event-based video recording to cloud and playback from cloud