近实时分析视频Analyze videos in near real time

本文演示如何使用计算机视觉 API 对实时视频流中的帧执行近实时分析。This article demonstrates how to perform near real-time analysis on frames that are taken from a live video stream by using the Computer Vision API. 此类分析的基本要素包括:The basic elements of such an analysis are:

  • 从视频源获取帧。Acquiring frames from a video source.
  • 选择要分析的帧。Selecting which frames to analyze.
  • 将这些帧提交给 API。Submitting these frames to the API.
  • 使用从 API 调用返回的每个分析结果。Consuming each analysis result that's returned from the API call.

本文中的示例是用 C# 编写的。The samples in this article are written in C#. 若要访问代码,请转到 GitHub 上的视频帧分析示例页。To access the code, go to the Video frame analysis sample page on GitHub.

近实时分析的运行方法Approaches to running near real-time analysis

可以使用各种方法来解决对视频流运行近实时分析时出现的问题。You can solve the problem of running near real-time analysis on video streams by using a variety of approaches. 本文按从简单到复杂的顺序概述了三种方法。This article outlines three of them, in increasing levels of sophistication.

设计无限循环Design an infinite loop

近实时分析的最简单设计是无限循环。The simplest design for near real-time analysis is an infinite loop. 在此循环的每个迭代中,可以抓取帧、对其进行分析,然后使用结果:In each iteration of this loop, you grab a frame, analyze it, and then consume the result:

while (true)
{
    Frame f = GrabFrame();
    if (ShouldAnalyze(f))
    {
        AnalysisResult r = await Analyze(f);
        ConsumeResult(r);
    }
}

如果分析包括轻量客户端算法,则适合使用此方法。If your analysis were to consist of a lightweight, client-side algorithm, this approach would be suitable. 但是,如果分析在云中发生,则出现的延迟可能会导致 API 调用花费数秒。However, when the analysis occurs in the cloud, the resulting latency means that an API call might take several seconds. 在此期间,我们不会捕获图像,而线程基本上不执行任何操作。During this time, you're not capturing images, and your thread is essentially doing nothing. 最大帧速率受 API 调用延迟的限制。Your maximum frame rate is limited by the latency of the API calls.

允许 API 调用并行运行Allow the API calls to run in parallel

尽管简单的单线程循环适比较适合轻量客户端算法,但不太适用于云 API 调用存在延迟的情况。Although a simple, single-threaded loop makes sense for a lightweight, client-side algorithm, it doesn't fit well with the latency of a cloud API call. 此问题的解决方法是让长时间运行的 API 调用与抓帧操作并行运行。The solution to this problem is to allow the long-running API call to run in parallel with the frame-grabbing. 在 C# 中,可以使用基于任务的并行度来实现此目的。In C#, you could do this by using task-based parallelism. 例如,可以运行以下代码:For example, you can run the following code:

while (true)
{
    Frame f = GrabFrame();
    if (ShouldAnalyze(f))
    {
        var t = Task.Run(async () =>
        {
            AnalysisResult r = await Analyze(f);
            ConsumeResult(r);
        }
    }
}

使用此方法可在单独的任务中启动每项分析。With this approach, you launch each analysis in a separate task. 继续抓取新帧时,该任务可在后台运行。The task can run in the background while you continue grabbing new frames. 该方法可以避免在等待 API 调用返回时阻塞主线程。The approach avoids blocking the main thread as you wait for an API call to return. 但是,该方法也可能存在几项弱点:However, the approach can present certain disadvantages:

  • 你会失去简单版本提供的某些保障。It costs you some of the guarantees that the simple version provided. 也就是说,多个 API 调用可能并行执行,但结果可能以错误的顺序返回。That is, multiple API calls might occur in parallel, and the results might get returned in the wrong order.
  • 这也可能导致多个线程同时进入 ConsumeResult() 函数,如果该函数非线程安全,这可能会很危险。It could also cause multiple threads to enter the ConsumeResult() function simultaneously, which might be dangerous if the function isn't thread-safe.
  • 最后,此简单代码不会跟踪创建的任务,异常将以无提示方式消失。Finally, this simple code doesn't keep track of the tasks that get created, so exceptions silently disappear. 因此,需要添加“使用者”线程,用于跟踪分析任务,引发异常,终止长时间运行的任务,并确保以正确的顺序逐个使用结果。Thus, you need to add a "consumer" thread that tracks the analysis tasks, raises exceptions, kills long-running tasks, and ensures that the results get consumed in the correct order, one at a time.

设计生成者-使用者系统Design a producer-consumer system

最后一种方法是设计一个“生成者-使用者”系统:构建一个类似于上述无限循环的生成者线程。For your final approach, designing a "producer-consumer" system, you build a producer thread that looks similar to your previously mentioned infinite loop. 但是,生成者不会在分析结果可用后立即使用这些结果,而仅仅是将任务放入队列,以对其进行跟踪。However, instead of consuming the analysis results as soon as they're available, the producer simply places the tasks in a queue to keep track of them.

// Queue that will contain the API call tasks.
var taskQueue = new BlockingCollection<Task<ResultWrapper>>();

// Producer thread.
while (true)
{
    // Grab a frame.
    Frame f = GrabFrame();

    // Decide whether to analyze the frame.
    if (ShouldAnalyze(f))
    {
        // Start a task that will run in parallel with this thread.
        var analysisTask = Task.Run(async () =>
        {
            // Put the frame, and the result/exception into a wrapper object.
            var output = new ResultWrapper(f);
            try
            {
                output.Analysis = await Analyze(f);
            }
            catch (Exception e)
            {
                output.Exception = e;
            }
            return output;
        }

        // Push the task onto the queue.
        taskQueue.Add(analysisTask);
    }
}

还需要创建一个使用者线程,它会将任务从队列中取出,等待它们完成,并显示结果或引发已引发的异常。You also create a consumer thread, which takes tasks off the queue, waits for them to finish, and either displays the result or raises the exception that was thrown. 使用队列可以保证按正确的顺序逐个使用结果,而不会限制系统的最大帧速率。By using the queue, you can guarantee that the results get consumed one at a time, in the correct order, without limiting the maximum frame rate of the system.

// Consumer thread.
while (true)
{
    // Get the oldest task.
    Task<ResultWrapper> analysisTask = taskQueue.Take();
 
    // Wait until the task is completed.
    var output = await analysisTask;

    // Consume the exception or result.
    if (output.Exception != null)
    {
        throw output.Exception;
    }
    else
    {
        ConsumeResult(output.Analysis);
    }
}

实施解决方案Implement the solution

快速入门Get started quickly

为帮助你尽快启动并运行应用,我们已实施前面部分中所述的系统。To help get your app up and running as quickly as possible, we've implemented the system that's described in the preceding section. 该系统具有足够的灵活性,可适应多种方案,并且易于使用。It's intended to be flexible enough to accommodate many scenarios, while being easy to use. 若要访问代码,请转到 GitHub 上的视频帧分析示例页。To access the code, go to the Video frame analysis sample page on GitHub.

该库包含 FrameGrabber 类,该类会通过实现上述生成者-使用者系统来处理网络摄像头中的视频帧。The library contains the FrameGrabber class, which implements the previously discussed producer-consumer system to process video frames from a webcam. 用户可以指定 API 调用的确切形式,该类将使用事件来让调用代码知道何时获取新帧或者新的分析结果何时可用。Users can specify the exact form of the API call, and the class uses events to let the calling code know when a new frame is acquired, or when a new analysis result is available.

为了演示某些可行性,我们提供了使用该库的两个示例应用。To illustrate some of the possibilities, we've provided two sample apps that use the library.

第一个示例应用是一个简单的控制台应用,它从默认的网络摄像头抓帧,然后将其提交到人脸服务进行人脸检测。The first sample app is a simple console app that grabs frames from the default webcam and then submits them to the Face service for face detection. 以下代码再现了该应用的简化版本:A simplified version of the app is reproduced in the following code:

using System;
using System.Linq;
using Microsoft.Azure.CognitiveServices.Vision.Face;
using Microsoft.Azure.CognitiveServices.Vision.Face.Models;
using VideoFrameAnalyzer;

namespace BasicConsoleSample
{
    internal class Program
    {
        const string ApiKey = "<your API key>";
        const string Endpoint = "https://<your API region>.api.cognitive.azure.cn";

        private static async Task Main(string[] args)
        {
            // Create grabber.
            FrameGrabber<DetectedFace[]> grabber = new FrameGrabber<DetectedFace[]>();

            // Create Face Client.
            FaceClient faceClient = new FaceClient(new ApiKeyServiceClientCredentials(ApiKey))
            {
                Endpoint = Endpoint
            };

            // Set up a listener for when we acquire a new frame.
            grabber.NewFrameProvided += (s, e) =>
            {
                Console.WriteLine($"New frame acquired at {e.Frame.Metadata.Timestamp}");
            };

            // Set up a Face API call.
            grabber.AnalysisFunction = async frame =>
            {
                Console.WriteLine($"Submitting frame acquired at {frame.Metadata.Timestamp}");
                // Encode image and submit to Face service.
                return (await faceClient.Face.DetectWithStreamAsync(frame.Image.ToMemoryStream(".jpg"))).ToArray();
            };

            // Set up a listener for when we receive a new result from an API call.
            grabber.NewResultAvailable += (s, e) =>
            {
                if (e.TimedOut)
                    Console.WriteLine("API call timed out.");
                else if (e.Exception != null)
                    Console.WriteLine("API call threw an exception.");
                else
                    Console.WriteLine($"New result received for frame acquired at {e.Frame.Metadata.Timestamp}. {e.Analysis.Length} faces detected");
            };

            // Tell grabber when to call the API.
            // See also TriggerAnalysisOnPredicate
            grabber.TriggerAnalysisOnInterval(TimeSpan.FromMilliseconds(3000));

            // Start running in the background.
            await grabber.StartProcessingCameraAsync();

            // Wait for key press to stop.
            Console.WriteLine("Press any key to stop...");
            Console.ReadKey();

            // Stop, blocking until done.
            await grabber.StopProcessingAsync();
        }
    }
}

第二个示例应用更有趣。The second sample app is a bit more interesting. 它允许选择对视频帧调用哪个 API。It allows you to choose which API to call on the video frames. 在左侧,该应用会显示实时视频的预览。On the left side, the app shows a preview of the live video. 在右侧,它会将最新 API 结果重叠在相应的帧上。On the right, it overlays the most recent API result on the corresponding frame.

在大多数模式下,左侧的实时视频与右侧的可视化分析之间存在明显的延迟。In most modes, there's a visible delay between the live video on the left and the visualized analysis on the right. 这种延迟是发出 API 调用所花费的时间。This delay is the time that it takes to make the API call. 例外情况是“EmotionsWithClientFaceDetect”模式,它使用 OpenCV 在客户端计算机上本地执行人脸检测,然后将所有图像提交给 Azure 认知服务。An exception is in the "EmotionsWithClientFaceDetect" mode, which performs face detection locally on the client computer by using OpenCV before it submits any images to Azure Cognitive Services.

使用此方法可以立即可视化检测到的人脸。By using this approach, you can visualize the detected face immediately. 然后可以在 API 调用返回后更新情绪。You can then update the emotions later, after the API call returns. 这证明了“混合”方法的可行性。This demonstrates the possibility of a "hybrid" approach. 即,可以在客户端上执行一些简单的处理,然后在必要时使用认知服务 API 通过更高级的分析对这种处理进行补充。That is, some simple processing can be performed on the client, and then Cognitive Services APIs can be used to augment this processing with more advanced analysis when necessary.

显示带标记图像的 LiveCameraSample 应用

将示例集成到基础代码中Integrate the samples into your codebase

若要开始使用此示例,请执行以下操作:To get started with this sample, do the following:

  1. 创建 Azure 帐户Create an Azure account. 如果已有帐户,请跳至下一步。If you already have one, you can skip to the next step.
  2. 在 Azure 门户中为计算机视觉和人脸创建资源,以获取密钥和终结点。Create resources for Computer Vision and Face in the Azure portal to get your key and endpoint. 请确保在设置过程中选择免费层 (F0)。Make sure to select the free tier (F0) during setup.
    • 计算机视觉Computer Vision
    • 人脸 部署资源后,单击“转到资源”,以收集每项资源的密钥和终结点。Face After the resources are deployed, click Go to resource to collect your key and endpoint for each resource.
  3. 克隆 Cognitive-Samples-VideoFrameAnalysis GitHub 存储库。Clone the Cognitive-Samples-VideoFrameAnalysis GitHub repo.
  4. 在 Visual Studio 2015 或更高版本中打开示例,然后生成并运行示例应用程序:Open the sample in Visual Studio 2015 or later, and then build and run the sample applications:
    • 对于 BasicConsoleSample,人脸密钥直接在 BasicConsoleSample/Program.cs 中进行硬编码。For BasicConsoleSample, the Face key is hard-coded directly in BasicConsoleSample/Program.cs.
    • 对于 LiveCameraSample,请在应用的“设置”窗格中输入密钥。For LiveCameraSample, enter the keys in the Settings pane of the app. 在切换不同的会话后,这些密钥将持久保存为用户数据。The keys are persisted across sessions as user data.

准备好集成示例时,请从自己的项目引用 VideoFrameAnalyzer 库。When you're ready to integrate the samples, reference the VideoFrameAnalyzer library from your own projects.

VideoFrameAnalyzer 的图像、语音、视频和文本理解功能使用 Azure 认知服务。The image-, voice-, video-, and text-understanding capabilities of VideoFrameAnalyzer use Azure Cognitive Services. Microsoft 将接收你(通过此应用)上传的图像、音频、视频和其他数据,并可能将其用于服务改进目的。Microsoft receives the images, audio, video, and other data that you upload (via this app) and might use them for service-improvement purposes. 你的应用发送了用户的数据给 Azure 认知服务,请协助我们保护这些用户。We ask for your help in protecting the people whose data your app sends to Azure Cognitive Services.

总结Summary

本文介绍了如何使用人脸服务和计算机视觉服务对实时视频流运行近实时分析。In this article, you learned how to run near real-time analysis on live video streams by using the Face and Computer Vision services. 此外还介绍了如何通过示例代码开始使用这些功能。You also learned how you can use our sample code to get started.

欢迎在 GitHub 存储库中提供反馈和建议。Feel free to provide feedback and suggestions in the GitHub repository. 若要提供更广泛的 API 反馈,请访问 UserVoice 站点To provide broader API feedback, go to our UserVoice site.