示例:如何实时分析视频Example: How to Analyze Videos in Real-time

本指南将演示如何对从实时视频流中获取的帧执行近实时分析。This guide will demonstrate how to perform near-real-time analysis on frames taken from a live video stream. 此类系统的基本组成如下:The basic components in such a system are:

  • 从视频源获取帧Acquire frames from a video source
  • 选择要分析的帧Select which frames to analyze
  • 将这些帧提交给 APISubmit these frames to the API
  • 使用从 API 调用返回的每个分析结果Consume each analysis result that is returned from the API call

这些示例是使用 C# 编写的,代码可以在 GitHub 上找到,网址为:https://github.com/Microsoft/Cognitive-Samples-VideoFrameAnalysisThese samples are written in C# and the code can be found on GitHub here: https://github.com/Microsoft/Cognitive-Samples-VideoFrameAnalysis.

方法The Approach

可通过多种方法解决对视频流运行近实时分析的问题。There are multiple ways to solve the problem of running near-real-time analysis on video streams. 我们先概述三种复杂程度渐增的方法。We will start by outlining three approaches in increasing levels of sophistication.

简单的方法A Simple Approach

近实时分析系统的最简单设计是无限循环,在每次迭代中捕捉一个帧,对它进行分析,然后使用结果:The simplest design for a near-real-time analysis system is an infinite loop, where each iteration grabs a frame, analyzes it, and then consumes the result:

while (true)
{
    Frame f = GrabFrame();
    if (ShouldAnalyze(f))
    {
        AnalysisResult r = await Analyze(f);
        ConsumeResult(r);
    }
}

如果分析由轻量级客户端算法组成,则适合采用这种方法。If our analysis consisted of a lightweight client-side algorithm, this approach would be suitable. 但是,当在云端进行分析时,所涉及的延迟意味着 API 调用可能需要几秒钟的时间。However, when analysis happens in the cloud, the latency involved means that an API call might take several seconds. 在此期间,我们不会捕获图像,而线程基本上不执行任何操作。During this time, we are not capturing images, and our thread is essentially doing nothing. 我们的最大帧速率受 API 调用延迟的限制。Our maximum frame-rate is limited by the latency of the API calls.

并行化 API 调用Parallelizing API Calls

简单的单线程循环适用于轻量级客户端算法,但不适用于云 API 调用中涉及的延迟。While a simple single-threaded loop makes sense for a lightweight client-side algorithm, it doesn't fit well with the latency involved in cloud API calls. 该问题的解决方案是允许长时间运行的 API 调用与帧捕捉并行执行。The solution to this problem is to allow the long-running API calls to execute in parallel with the frame-grabbing. 在 C# 中,我们可以使用基于任务的并行来实现这一点,例如:In C#, we could achieve this using Task-based parallelism, for example:

while (true)
{
    Frame f = GrabFrame();
    if (ShouldAnalyze(f))
    {
        var t = Task.Run(async () => 
        {
            AnalysisResult r = await Analyze(f);
            ConsumeResult(r);
        }
    }
}

此代码将在单独的任务中启动每个分析,当我们继续捕捉新帧时,这些任务可以在后台运行。This code launches each analysis in a separate Task, which can run in the background while we continue grabbing new frames. 使用此方法时,我们可以避免在等待 API 调用返回时阻塞主线程,但是失去了简单版本提供的一些保证。With this method we avoid blocking the main thread while waiting for an API call to return, but we have lost some of the guarantees that the simple version provided. 多个 API 调用可能并行执行,但结果可能以错误的顺序返回。Multiple API calls might occur in parallel, and the results might get returned in the wrong order. 这也可能导致多个线程同时进入 ConsumeResult() 函数,如果该函数非线程安全,这可能会很危险。This could also cause multiple threads to enter the ConsumeResult() function simultaneously, which could be dangerous, if the function is not thread-safe. 最后,这个简单的代码不会跟踪所创建的任务,因此异常将以无提示方式消失。Finally, this simple code does not keep track of the Tasks that get created, so exceptions will silently disappear. 因此,最终步骤是添加“使用者”线程,它将跟踪分析任务,引发异常,终止长时间运行的任务,并确保以正确的顺序使用结果。Therefore, the final step is to add a "consumer" thread that will track the analysis tasks, raise exceptions, kill long-running tasks, and ensure that the results get consumed in the correct order.

生产者-使用者设计A Producer-Consumer Design

在最终的“生产者-使用者”系统中,我们有一个生产者线程,看起来与我们之前的无限循环类似。In our final "producer-consumer" system, we have a producer thread that looks similar to our previous infinite loop. 但是,生产者只需将任务放入队列即可跟踪它们,而不必在分析结果可用时立即使用它们。However, instead of consuming analysis results as soon as they are available, the producer simply puts the tasks into a queue to keep track of them.

// Queue that will contain the API call tasks. 
var taskQueue = new BlockingCollection<Task<ResultWrapper>>();
     
// Producer thread. 
while (true)
{
    // Grab a frame. 
    Frame f = GrabFrame();
 
    // Decide whether to analyze the frame. 
    if (ShouldAnalyze(f))
    {
        // Start a task that will run in parallel with this thread. 
        var analysisTask = Task.Run(async () => 
        {
            // Put the frame, and the result/exception into a wrapper object.
            var output = new ResultWrapper(f);
            try
            {
                output.Analysis = await Analyze(f);
            }
            catch (Exception e)
            {
                output.Exception = e;
            }
            return output;
        }
        
        // Push the task onto the queue. 
        taskQueue.Add(analysisTask);
    }
}

我们还有一个使用者线程,它会将任务从队列中取出,等待它们完成,并显示结果或引发已引发的异常。We also have a consumer thread that takes tasks off the queue, waits for them to finish, and either displays the result or raises the exception that was thrown. 通过使用队列,我们​​可以保证按正确的顺序一次使用一个结果,而不限制系统的最大帧速率。By using the queue, we can guarantee that results get consumed one at a time, in the correct order, without limiting the maximum frame-rate of the system.

// Consumer thread. 
while (true)
{
    // Get the oldest task. 
    Task<ResultWrapper> analysisTask = taskQueue.Take();
 
    // Await until the task is completed. 
    var output = await analysisTask;
     
    // Consume the exception or result. 
    if (output.Exception != null)
    {
        throw output.Exception;
    }
    else
    {
        ConsumeResult(output.Analysis);
    }
}

实现解决方案Implementing the Solution

入门Getting Started

为了尽快启动和运行应用,需灵活实施上述系统。To get your app up and running as quickly as possible, you will use a flexible implementation of the system described above. 若要访问代码,请转到 https://github.com/Microsoft/Cognitive-Samples-VideoFrameAnalysisTo access the code, go to https://github.com/Microsoft/Cognitive-Samples-VideoFrameAnalysis.

该库包含 FrameGrabber 类,该类可实现上面所述的生产者-使用者系统,以处理来自网络摄像头的视频帧。The library contains the class FrameGrabber, which implements the producer-consumer system discussed above to process video frames from a webcam. 用户可以指定 API 调用的确切形式,该类将使用事件来让调用代码知道何时获取新帧或者新的分析结果何时可用。The user can specify the exact form of the API call, and the class uses events to let the calling code know when a new frame is acquired or a new analysis result is available.

为了说明一些可能性,下面例举了使用该库的两个示例应用。To illustrate some of the possibilities, there are two sample apps that use the library. 第一个是简单的控制台应用,下面再现了它的简化版本。The first is a simple console app, and a simplified version of it is reproduced below. 它从默认网络摄像头抓取帧,并将它们提交到人脸服务进行人脸检测。It grabs frames from the default webcam, and submits them to the Face service for face detection.

using System;
/* See:
 * https://github.com/Microsoft/Cognitive-Samples-VideoFrameAnalysis
 * Compile and add reference to VideoFrameAnalyzer.dll.
 * Install NuGet package OpenCVSharp.
 */
using VideoFrameAnalyzer;
// Install NuGet package Microsoft.Azure.CognitiveServices.Vision.Face.
using Microsoft.Azure.CognitiveServices.Vision.Face;
using Microsoft.Azure.CognitiveServices.Vision.Face.Models;
using System.Collections.Generic;
using System.Threading.Tasks;

namespace VideoFrameConsoleApplication
{
    class Program
    {
        static string SUBSCRIPTION_KEY = Environment.GetEnvironmentVariable("FACE_SUBSCRIPTION_KEY");
        static string ENDPOINT = Environment.GetEnvironmentVariable("FACE_ENDPOINT");

        static void Main(string[] args)
        {
            IFaceClient client = new FaceClient(new ApiKeyServiceClientCredentials(SUBSCRIPTION_KEY)) { Endpoint = ENDPOINT };

            // Define this in Main so it is closed over the client.
            async Task<DetectedFace[]> Detect(VideoFrame frame)
            {
                return (DetectedFace[])await client.Face.DetectWithStreamAsync(frame.Image.ToMemoryStream(".jpg"), detectionModel:DetectionModel.Detection02);
            }

            // Create grabber, with analysis type Face[]. 
            FrameGrabber<DetectedFace[]> grabber = new FrameGrabber<DetectedFace[]>();

            // Set up our Face API call.
            grabber.AnalysisFunction = Detect;

            // Set up a listener for when we receive a new result from an API call. 
            grabber.NewResultAvailable += (s, e) =>
            {
                if (e.Analysis != null)
                    Console.WriteLine("New result received for frame acquired at {0}. {1} faces detected", e.Frame.Metadata.Timestamp, e.Analysis.Length);
            };

            // Tell grabber to call the Face API every 3 seconds.
            grabber.TriggerAnalysisOnInterval(TimeSpan.FromMilliseconds(3000));

            // Start running.
            grabber.StartProcessingCameraAsync().Wait();

            // Wait for keypress to stop
            Console.WriteLine("Press any key to stop...");
            Console.ReadKey();

            // Stop, blocking until done.
            grabber.StopProcessingAsync().Wait();
        }
    }
}

第二个示例应用更有趣,允许针对视频帧选择调用哪个 API。The second sample app is a bit more interesting, and allows you to choose which API to call on the video frames. 在左侧,应用显示实时视频预览,在右侧,它显示重叠在相应帧上的最新 API 结果。On the left-hand side, the app shows a preview of the live video, on the right-hand side it shows the most recent API result overlaid on the corresponding frame.

在大多数模式中,左侧的实时视频和右侧的可视化分析之间会有明显的延迟。In most modes, there will be a visible delay between the live video on the left, and the visualized analysis on the right. 此延迟是进行 API 调用所需的时间。This delay is the time taken to make the API call. 例外情况是“EmotionsWithClientFaceDetect”模式,它使用 OpenCV 在客户端计算机上本地执行人脸检测,然后将全部图像提交给认知服务。One exception is the "EmotionsWithClientFaceDetect" mode, which performs face detection locally on the client computer using OpenCV, before submitting any images to Cognitive Services. 这样我们就可以立即可视化检测到的人脸,然后在 API 调用返回后更新情感。This way, we can visualize the detected face immediately and then update the emotions once the API call returns. 这是“混合”方法的示例,其中的客户端可以执行一些简单的处理,认知服务 API 可以在需要时通过更高级的分析对其进行补充。This is an example of a "hybrid" approach, where the client can perform some simple processing, and Cognitive Services APIs can augment this with more advanced analysis when necessary.

HowToAnalyzeVideo

集成到基本代码中Integrating into your codebase

若要开始使用此示例,请按照下列步骤操作:To get started with this sample, follow these steps:

  1. 创建 Azure 帐户Create an Azure account. 如果已有帐户,请跳至下一步。If you already have one, you can skip to the next step.
  2. 在 Azure 门户中为计算机视觉和人脸创建资源,以获取密钥和终结点。Create resources for Computer Vision and Face in the Azure portal to get your key and endpoint. 请确保在设置过程中选择免费层 (F0)。Make sure to select the free tier (F0) during setup.
    • 计算机视觉Computer Vision
    • 人脸 部署资源后,单击“转到资源”,以收集每项资源的密钥和终结点。Face After the resources are deployed, click Go to resource to collect your key and endpoint for each resource.
  3. 克隆 Cognitive-Samples-VideoFrameAnalysis GitHub 存储库。Clone the Cognitive-Samples-VideoFrameAnalysis GitHub repo.
  4. 在 Visual Studio 中打开示例,然后生成并运行示例应用程序:Open the sample in Visual Studio, and build and run the sample applications:
    • 对于 BasicConsoleSample,人脸密钥直接在 BasicConsoleSample/Program.cs 中进行硬编码。For BasicConsoleSample, the Face key is hard-coded directly in BasicConsoleSample/Program.cs.
    • 对于 LiveCameraSample,应将密钥输入应用的“设置”窗格。For LiveCameraSample, the keys should be entered into the Settings pane of the app. 它们将作为用户数据保留在各会话中。They will be persisted across sessions as user data.

当准备好进行集成时,请从你自己的项目中引用 VideoFrameAnalyzer 库。When you're ready to integrate, reference the VideoFrameAnalyzer library from your own projects.

总结Summary

本指南介绍了如何使用人脸 API、计算机视觉 API 和情感 API 对实时视频流运行近实时分析,以及如何使用我们的示例代码开始操作。In this guide, you learned how to run near-real-time analysis on live video streams using the Face, Computer Vision, and Emotion APIs, and how to use our sample code to get started.

请随时在 GitHub 存储库中提供反馈和建议,或者在我们的 UserVoice 站点上提供更广泛的 API 反馈。Feel free to provide feedback and suggestions in the GitHub repository or, for broader API feedback, on our UserVoice site.