使用 Azure 媒体分析进行面部修订Redact faces with Azure Media Analytics

概述Overview

Azure 媒体修订器是一种 Azure 媒体分析媒体处理器 (MP),可用于在云中进行可缩放的面部修订。Azure Media Redactor is an Azure Media Analytics media processor (MP) that offers scalable face redaction in the cloud. 使用面部修订,可对视频进行修改,使所选个人的面部模糊显示。Face redaction enables you to modify your video in order to blur faces of selected individuals. 用户可能想要在公共安全和新闻媒体场景中使用面部修订服务。You may want to use the face redaction service in public safety and news media scenarios. 对于时长仅几分钟但包含多张面孔的镜头,进行手动面部修订可能需要几个小时,但使用此服务仅需几个简单步骤即可完成该过程。A few minutes of footage that contains multiple faces can take hours to redact manually, but with this service the face redaction process will require just a few simple steps. 有关详细信息,请参阅博客。For more information, see this blog.

本文提供了有关 Azure 媒体编修器的详细信息,并演示了如何通过适用于 .NET 的媒体服务 SDK 使用它。This article gives details about Azure Media Redactor and shows how to use it with Media Services SDK for .NET.

面部修订模式Face redaction modes

面部修订的工作方式是:检测每一帧视频中的面部,并跟踪之前和之后的面部对象,以便同一个人在其他角度也模糊显示。Facial redaction works by detecting faces in every frame of video and tracking the face object both forwards and backwards in time, so that the same individual can be blurred from other angles as well. 自动修订过程非常复杂,并且无法始终产生 100% 符合要求的输出,因此,媒体分析提供了几种修改最终输出的方式。The automated redaction process is complex and does not always produce 100% of desired output, for this reason Media Analytics provides you with a couple of ways to modify the final output.

除了完全自动模式外,还可使用双步工作流通过 ID 列表选择/取消选找到的面部。In addition to a fully automatic mode, there is a two-pass workflow, which allows the selection/de-selection of found faces via a list of IDs. 此外,为了对每一帧进行任意调整,MP 使用 JSON 格式的元数据文件。Also, to make arbitrary per frame adjustments the MP uses a metadata file in JSON format. 此工作流拆分为“分析” 和“修订” 模式。This workflow is split into Analyze and Redact modes. 可将这两个模式组合为在一个作业中运行两项任务的单个过程;此模式称为“组合” 。You can combine the two modes in a single pass that runs both tasks in one job; this mode is called Combined.

组合模式Combined mode

这自动生成经过修订的 mp4,无需任何手动输入。This produces a redacted mp4 automatically without any manual input.

阶段Stage 文件名File Name 注释Notes
输入资产Input asset foo.barfoo.bar WMV、MOV 或 MP4 格式的视频Video in WMV, MOV, or MP4 format
输入配置Input config Job configuration presetJob configuration preset {'version':'1.0', 'options': {'mode':'combined'}}{'version':'1.0', 'options': {'mode':'combined'}}
输出资产Output asset foo_redacted.mp4foo_redacted.mp4 进行了模糊处理的视频Video with blurring applied

输入示例:Input example:

观看此视频view this video

输出示例:Output example:

观看此视频view this video

分析模式Analyze mode

双步工作流的 分析 步骤使用视频输入,并生成表示面部位置的 JSON 文件,以及显示每个检测到的面部的 jpg 图像。The analyze pass of the two-pass workflow takes a video input and produces a JSON file of face locations, and jpg images of each detected face.

阶段Stage 文件名File Name 注释Notes
输入资产Input asset foo.barfoo.bar WMV、MPV 或 MP4 格式的视频Video in WMV, MPV, or MP4 format
输入配置Input config Job configuration presetJob configuration preset {'version':'1.0', 'options': {'mode':'analyze'}}{'version':'1.0', 'options': {'mode':'analyze'}}
输出资产Output asset foo_annotations.jsonfoo_annotations.json JSON 格式的面部位置批注数据。Annotation data of face locations in JSON format. 用户可编辑此数据,以修改模糊边界框。This can be edited by the user to modify the blurring bounding boxes. 请查看以下示例。See sample below.
输出资产Output asset foo_thumb%06d.jpg [foo_thumb000001.jpg, foo_thumb000002.jpg]foo_thumb%06d.jpg [foo_thumb000001.jpg, foo_thumb000002.jpg] 裁剪后的 jpg 文件,显示每个检测到的面部,其中的数字指示面部的标签 IDA cropped jpg of each detected face, where the number indicates the labelId of the face

输出示例:Output example:

    {
      "version": 1,
      "timescale": 24000,
      "offset": 0,
      "framerate": 23.976,
      "width": 1280,
      "height": 720,
      "fragments": [
        {
          "start": 0,
          "duration": 48048,
          "interval": 1001,
          "events": [
            [],
            [],
            [],
            [],
            [],
            [],
            [],
            [],
            [],
            [],
            [],
            [],
            [],
            [
              {
                "index": 13,
                "id": 1138,
                "x": 0.29537,
                "y": -0.18987,
                "width": 0.36239,
                "height": 0.80335
              },
              {
                "index": 13,
                "id": 2028,
                "x": 0.60427,
                "y": 0.16098,
                "width": 0.26958,
                "height": 0.57943
              }
            ],

    … truncated

修订模式Redact mode

工作流的第二步使用更大数量的输入,这些输入必须合并为单个资产。The second pass of the workflow takes a larger number of inputs that must be combined into a single asset.

这包括要模糊处理的 ID 的列表、原始视频和批注 JSON。This includes a list of IDs to blur, the original video, and the annotations JSON. 此模式使用批注来对输入视频进行模糊处理。This mode uses the annotations to apply blurring on the input video.

“分析”步骤的输出不包括原始视频。The output from the Analyze pass does not include the original video. 需要将该视频上传到“修订”模式任务的输入资产中,并将其选作主文件。The video needs to be uploaded into the input asset for the Redact mode task and selected as the primary file.

阶段Stage 文件名File Name 注释Notes
输入资产Input asset foo.barfoo.bar WMV、MPV 或 MP4 格式的视频。Video in WMV, MPV, or MP4 format. 与步骤 1 中相同的视频。Same video as in step 1.
输入资产Input asset foo_annotations.jsonfoo_annotations.json 第一阶段中的批注元数据文件,包含可选的修改。annotations metadata file from phase one, with optional modifications.
输入资产Input asset foo_IDList.txt(可选)foo_IDList.txt (Optional) 要进行修订的可选面部 ID 列表,以新行进行分隔。Optional new line separated list of face IDs to redact. 如果留空,则模糊所有面部。If left blank, this blurs all faces.
输入配置Input config Job configuration presetJob configuration preset {'version':'1.0', 'options': {'mode':'redact'}}{'version':'1.0', 'options': {'mode':'redact'}}
输出资产Output asset foo_redacted.mp4foo_redacted.mp4 基于批注进行了模糊处理的视频Video with blurring applied based on annotations

示例输出Example output

这是来自选择了一个 ID 的 ID 列表的输出。This is the output from an IDList with one ID selected.

观看此视频view this video

示例 foo_IDList.txtExample foo_IDList.txt

 1
 2
 3

模糊类型Blur types

在“合并”或“修订”模式下,可通过 JSON 输入配置在 5 种不同的模糊模式中选择 :“低”、“中”、“高”、“框”和“黑色” 。In the Combined or Redact mode, there are 5 different blur modes you can choose from via the JSON input configuration: Low, Med, High, Box, and Black. 默认情况下使用“中” 。By default Med is used.

可以查找以下模糊类型的示例。You can find samples of the blur types below.

示例 JSON:Example JSON:

    {'version':'1.0', 'options': {'Mode': 'Combined', 'BlurType': 'High'}}

Low

低

Med

中

High

高

BoxBox

Box

黑色Black

黑色

输出 JSON 文件中的元素Elements of the output JSON file

修订 MP 提供高精确度的面部位置检测和跟踪功能,可在一个视频帧中检测到最多 64 张人脸。The Redaction MP provides high precision face location detection and tracking that can detect up to 64 human faces in a video frame. 正面的面部可提供最佳效果,而检测和跟踪侧面的面部和较小的面部(小于或等于 24x24 像素)可能具有一定难度。Frontal faces provide the best results, while side faces and small faces (less than or equal to 24x24 pixels) are challenging.

作业将生成一个 JSON 输出文件,其中包含有关检测到的和跟踪的面部的元数据。The job produces a JSON output file that contains metadata about detected and tracked faces. 元数据包括指示面部位置的坐标,以及指示正在跟踪该人员的面部 ID 编号。The metadata includes coordinates indicating the location of faces, as well as a face ID number indicating the tracking of that individual. 在正面面部长时间于帧中消失或重叠的情况下,面部 ID 编号很容易重置,导致某些人员被分配多个 ID。Face ID numbers are prone to reset under circumstances when the frontal face is lost or overlapped in the frame, resulting in some individuals getting assigned multiple IDs.

输出 JSON 包含以下元素:The output JSON includes the following elements:

根 JSON 元素Root JSON elements

元素Element 说明Description
versionversion 这是指视频 API 的版本。This refers to the version of the Video API.
timescaletimescale 视频每秒的“刻度”数。"Ticks" per second of the video.
offsetoffset 这是时间戳的时间偏移量。This is the time offset for timestamps. 在版本 1.0 的视频 API 中,此属性始终为 0。In version 1.0 of Video APIs, this will always be 0. 在我们将来支持的方案中,此值可能会更改。In future scenarios we support, this value may change.
width、hightwidth, hight 输出视频帧的宽度和高度,以像素为单位。The width and hight of the output video frame, in pixels.
framerateframerate 视频的每秒帧数。Frames per second of the video.
fragmentsfragments 元数据划分成称为“片段”的不同段。The metadata is chunked up into different segments called fragments. 每个片段包含开始时间、持续时间、间隔数字和事件。Each fragment contains a start, duration, interval number, and event(s).

片段 JSON 元素Fragments JSON elements

元素Element 说明Description
startstart 第一个事件的开始时间,以时钟周期为单位。The start time of the first event in "ticks."
durationduration 片段的长度,以“时钟周期”为单位。The length of the fragment, in "ticks."
indexindex (仅适用于 Azure 媒体编修器)定义当前事件的帧索引。(Applies to Azure Media Redactor only) defines the frame index of the current event.
intervalinterval 片段中每个事件条目的间隔(以“时钟周期”为单位)。The interval of each event entry within the fragment, in "ticks."
eventsevents 每个事件包含在该持续时间内检测到并跟踪的面部。Each event contains the faces detected and tracked within that time duration. 它是一个事件数组。It is an array of events. 外部数组代表一个时间间隔。The outer array represents one interval of time. 内部数组包含在该时间点发生的 0 个或多个事件。The inner array consists of 0 or more events that happened at that point in time. 空括号 [] 代表没有检测到人脸。An empty bracket [] means no faces were detected.
idid 正在跟踪的面部的 ID。The ID of the face that is being tracked. 如果某个面部后来未被检测到,此编号可能会意外更改。This number may inadvertently change if a face becomes undetected. 给定人员在整个视频中应该拥有相同的 ID,但由于检测算法的限制(例如受到阻挡等情况),我们无法保证这一点。A given individual should have the same ID throughout the overall video, but this cannot be guaranteed due to limitations in the detection algorithm (occlusion, etc.).
x, yx, y 规范化 0.0 到 1.0 比例中面部边框左上角的 X 和 Y 坐标。The upper left X and Y coordinates of the face bounding box in a normalized scale of 0.0 to 1.0.
-X 和 Y 坐标总是相对于横向方向,因此如果视频是纵向(或使用 iOS 时上下颠倒),便需要相应地变换坐标。-X and Y coordinates are relative to landscape always, so if you have a portrait video (or upside-down, in the case of iOS), you'll have to transpose the coordinates accordingly.
width, heightwidth, height 规范化 0.0 到 1.0 比例中面部边框的宽度和高度。The width and height of the face bounding box in a normalized scale of 0.0 to 1.0.
facesDetectedfacesDetected 位于 JSON 结果的末尾,汇总在生成视频期间算法所检测到的面部数。This is found at the end of the JSON results and summarizes the number of faces that the algorithm detected during the video. 由于 ID 可能在面部无法检测时(例如面部离开屏幕、转向别处)意外重置,此数字并不一定与视频中的实际面部数相同。Because the IDs can be reset inadvertently if a face becomes undetected (e.g., the face goes off screen, looks away), this number may not always equal the true number of faces in the video.

.NET 示例代码.NET sample code

以下程序演示如何:The following program shows how to:

  1. 创建资产并将媒体文件上传到资产。Create an asset and upload a media file into the asset.

  2. 基于包含以下 json 预设的配置文件创建含有面部修订任务的作业:Create a job with a face redaction task based on a configuration file that contains the following json preset:

            {
                'version':'1.0',
                'options': {
                    'mode':'combined'
                }
            }
    
  3. 下载输出 JSON 文件。Download the output JSON files.

创建和配置 Visual Studio 项目Create and configure a Visual Studio project

设置开发环境,并在 app.config 文件中填充连接信息,如使用 .NET 进行媒体服务开发中所述。Set up your development environment and populate the app.config file with connection information, as described in Media Services development with .NET.

示例Example

using System;
using System.Configuration;
using System.IO;
using System.Linq;
using Microsoft.WindowsAzure.MediaServices.Client;
using System.Threading;
using System.Threading.Tasks;

namespace FaceRedaction
{
    class Program
    {
        // Read values from the App.config file.
        private static readonly string _AADTenantDomain =
            ConfigurationManager.AppSettings["AMSAADTenantDomain"];
        private static readonly string _RESTAPIEndpoint =
            ConfigurationManager.AppSettings["AMSRESTAPIEndpoint"];
        private static readonly string _AMSClientId =
            ConfigurationManager.AppSettings["AMSClientId"];
        private static readonly string _AMSClientSecret =
            ConfigurationManager.AppSettings["AMSClientSecret"];

        // Field for service context.
        private static CloudMediaContext _context = null;

        static void Main(string[] args)
        {
            AzureAdTokenCredentials tokenCredentials =
                new AzureAdTokenCredentials(_AADTenantDomain,
                    new AzureAdClientSymmetricKey(_AMSClientId, _AMSClientSecret),
                    AzureEnvironments.AzureChinaCloudEnvironment);

            var tokenProvider = new AzureAdTokenProvider(tokenCredentials);

            _context = new CloudMediaContext(new Uri(_RESTAPIEndpoint), tokenProvider);

            // Run the FaceRedaction job.
            var asset = RunFaceRedactionJob(@"C:\supportFiles\FaceRedaction\SomeFootage.mp4",
                        @"C:\supportFiles\FaceRedaction\config.json");

            // Download the job output asset.
            DownloadAsset(asset, @"C:\supportFiles\FaceRedaction\Output");
        }

        static IAsset RunFaceRedactionJob(string inputMediaFilePath, string configurationFile)
        {
            // Create an asset and upload the input media file to storage.
            IAsset asset = CreateAssetAndUploadSingleFile(inputMediaFilePath,
            "My Face Redaction Input Asset",
            AssetCreationOptions.None);

            // Declare a new job.
            IJob job = _context.Jobs.Create("My Face Redaction Job");

            // Get a reference to Azure Media Redactor.
            string MediaProcessorName = "Azure Media Redactor";

            var processor = GetLatestMediaProcessorByName(MediaProcessorName);

            // Read configuration from the specified file.
            string configuration = File.ReadAllText(configurationFile);

            // Create a task with the encoding details, using a string preset.
            ITask task = job.Tasks.AddNew("My Face Redaction Task",
            processor,
            configuration,
            TaskOptions.None);

            // Specify the input asset.
            task.InputAssets.Add(asset);

            // Add an output asset to contain the results of the job.
            task.OutputAssets.AddNew("My Face Redaction Output Asset", AssetCreationOptions.None);

            // Use the following event handler to check job progress.  
            job.StateChanged += new EventHandler<JobStateChangedEventArgs>(StateChanged);

            // Launch the job.
            job.Submit();

            // Check job execution and wait for job to finish.
            Task progressJobTask = job.GetExecutionProgressTask(CancellationToken.None);

            progressJobTask.Wait();

            // If job state is Error, the event handling
            // method for job progress should log errors.  Here we check
            // for error state and exit if needed.
            if (job.State == JobState.Error)
            {
                ErrorDetail error = job.Tasks.First().ErrorDetails.First();
                Console.WriteLine(string.Format("Error: {0}. {1}",
                                error.Code,
                                error.Message));
                return null;
            }

            return job.OutputMediaAssets[0];
        }

        static IAsset CreateAssetAndUploadSingleFile(string filePath, string assetName, AssetCreationOptions options)
        {
            IAsset asset = _context.Assets.Create(assetName, options);

            var assetFile = asset.AssetFiles.Create(Path.GetFileName(filePath));
            assetFile.Upload(filePath);

            return asset;
        }

        static void DownloadAsset(IAsset asset, string outputDirectory)
        {
            foreach (IAssetFile file in asset.AssetFiles)
            {
                file.Download(Path.Combine(outputDirectory, file.Name));
            }
        }

        static IMediaProcessor GetLatestMediaProcessorByName(string mediaProcessorName)
        {
            var processor = _context.MediaProcessors
            .Where(p => p.Name == mediaProcessorName)
            .ToList()
            .OrderBy(p => new Version(p.Version))
            .LastOrDefault();

            if (processor == null)
                throw new ArgumentException(string.Format("Unknown media processor",
                                       mediaProcessorName));

            return processor;
        }

        static private void StateChanged(object sender, JobStateChangedEventArgs e)
        {
            Console.WriteLine("Job state changed event:");
            Console.WriteLine("  Previous state: " + e.PreviousState);
            Console.WriteLine("  Current state: " + e.CurrentState);

            switch (e.CurrentState)
            {
                case JobState.Finished:
                    Console.WriteLine();
                    Console.WriteLine("Job is finished.");
                    Console.WriteLine();
                    break;
                case JobState.Canceling:
                case JobState.Queued:
                case JobState.Scheduled:
                case JobState.Processing:
                    Console.WriteLine("Please wait...\n");
                    break;
                case JobState.Canceled:
                case JobState.Error:
                    // Cast sender as a job.
                    IJob job = (IJob)sender;
                    // Display or log error details as needed.
                    // LogJobStop(job.Id);
                    break;
                default:
                    break;
            }
        }
    }
}

后续步骤Next steps

媒体服务 v3(最新版本)Media Services v3 (latest)

查看最新版本的 Azure 媒体服务!Check out the latest version of Azure Media Services!

媒体服务 v2(旧版)Media Services v2 (legacy)

Azure 媒体服务分析概述Azure Media Services Analytics Overview

Azure Media Analytics demos(Azure 媒体分析演示)Azure Media Analytics demos