教程:视频和脚本审查Tutorial: Video and transcript moderation

本教程介绍如何使用机器辅助审查和人工评审集成生成完整的视频和脚本审查解决方案。In this tutorial, you will learn how to build a complete video and transcript moderation solution with machine-assisted moderation and human review integration.

本教程演示如何:This tutorial shows you how to:

  • 压缩输入视频以加快处理速度Compress the input video(s) for faster processing
  • 审查视频以获得有洞察力的镜头和帧Moderate the video to get shots and frames with insights
  • 使用帧时间戳创建缩略图(图像)Use the frame timestamps to create thumbnails (images)
  • 提交时间戳和缩略图以创建视频评论Submit timestamps and thumbnails to create video reviews
  • 使用 Media Indexer API 将视频语音转换为文本(脚本)Convert the video speech to text (transcript) with the Media Indexer API
  • 使用文本审查服务审查脚本Moderate the transcript with the text moderation service
  • 将已审查的脚本添加到视频评论中Add the moderated transcript to the video review

先决条件Prerequisites

输入凭据Enter credentials

编辑文件 App.config 并添加 Active Directory 租户名称、服务终结点和用 ##### 表示的订阅密钥。Edit the App.config file and add the Active Directory tenant name, service endpoints, and subscription keys indicated by #####. 需要以下信息:You need the following information:

密钥Key 说明Description
AzureMediaServiceRestApiEndpoint Azure 媒体服务 (AMS) API 的终结点Endpoint for the Azure Media Services (AMS) API
ClientSecret Azure 媒体服务的订阅密钥Subscription key for Azure Media Services
ClientId Azure 媒体服务的客户端 IDClient ID for Azure Media Services
AzureAdTenantName 表示组织的 Active Directory 租户名称Active Directory tenant name representing your organization
ContentModeratorReviewApiSubscriptionKey 内容审查器评论 API 的订阅密钥Subscription key for the Content Moderator review API
ContentModeratorApiEndpoint 内容审查器 API 的终结点Endpoint for the Content Moderator API
ContentModeratorTeamId 内容审查器团队 IDContent moderator team ID

检查主代码Examine the main code

Program.cs 中的 Program 类是视频审查应用程序的主要入口点。The class Program in Program.cs is the main entry point to the video moderation application.

Program 类的方法Methods of Program class

方法Method 说明Description
Main 分析命令行、收集用户输入并开始进行处理。Parses command line, gathers user input, and starts processing.
ProcessVideo 压缩、上传、审查和创建视频评论。Compresses, uploads, moderates, and creates video reviews.
CreateVideoStreamingRequest 创建用于上传视频的流Creates a stream to upload a video
GetUserInputs 收集用户输入;在没有命令行选项时使用Gathers user input; used when no command-line options are present
Initialize 初始化审查过程所需的对象Initializes objects needed for the moderation process

Main 方法The Main method

Main() 是执行起始位置,因此,将从这里开始了解视频审查过程。Main() is where execution starts, so it's the place to start understanding the video moderation process.

static void Main(string[] args)
{
    if (args.Length == 0)
    {
        string videoPath = string.Empty;
            Initialize();
            GetUserInputs(out videoPath);
            AmsConfigurations.logFilePath = Path.Combine(Path.GetDirectoryName(videoPath), "log.txt");
            try
            {
                ProcessVideo(videoPath).Wait();
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
            }
    }
    else
    {
        DirectoryInfo directoryInfo = new DirectoryInfo(args[0]);
        if (args.Length == 2) bool.TryParse(args[1], out generateVtt);
        Initialize();
        AmsConfigurations.logFilePath = Path.Combine(args[0], "log.txt");
        var files = directoryInfo.GetFiles("*.mp4", SearchOption.AllDirectories);
        foreach (var file in files)
        {
            try
            {
                ProcessVideo(file.FullName).Wait();
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
            }
        }
    }
}

Main() 处理以下命令行参数:Main() handles the following command-line arguments:

  • 目录路径,该目录包含要提交进行审查的 MPEG-4 视频文件。The path to a directory containing MPEG-4 video files to be submitted for moderation. 系统会提交此目录及其子目录中的所有 *.mp4 文件进行审查。All *.mp4 files in this directory and its subdirectories are submitted for moderation.
  • (可选)一个布尔 (true/false) 标志,指示是否应该为了审查音频而生成文本脚本。Optionally, a Boolean (true/false) flag indicating whether text transcripts should be generated for the purpose of moderating audio.

如果不存在命令行参数,Main() 将调用 GetUserInputs()If no command-line arguments are present, Main() calls GetUserInputs(). 此方法提示用户输入单个视频文件的路径,并指定是否应生成文本脚本。This method prompts the user to enter the path to a single video file and to specify whether a text transcript should be generated.

备注

控制台应用程序使用 Azure Media Indexer API 根据上传视频的音频轨道生成脚本。结果以 WebVTT 格式提供。The console application uses the Azure Media Indexer API to generate transcripts from the uploaded video's audio track. The results are provided in WebVTT format. 有关此格式的详细信息,请参阅 Web Video Text Tracks Format(Web 视频文本轨道格式)。For more information on this format, see Web Video Text Tracks Format.

Initialize 和 ProcessVideo 方法Initialize and ProcessVideo methods

无论程序的选项来自命令行还是来自交互式用户输入,Main() 接下来都会调用 Initialize() 创建以下实例:Regardless of whether the program's options came from the command line or from interactive user input, Main() next calls Initialize() to create the following instances:

Class 说明Description
AMSComponent 在提交视频文件进行审查之前先对其进行压缩。Compresses video files before submitting them for moderation.
AMSconfigurations 与在 App.config 中找到的应用程序配置数据交互。Interface to the application's configuration data, found in App.config.
VideoModerator 使用 AMS SDK 上传、编码、加密和审查Uploading, encoding, encryption, and moderation using AMS SDK
VideoReviewApi 管理内容审查器服务中的视频评论Manages video reviews in the Content Moderator service

本教程的后续部分更详细地介绍了这些类(AMSConfigurations 除外,该类非常简单)。These classes (aside from AMSConfigurations, which is straightforward) are covered in more detail in upcoming sections of this tutorial.

最后,通过为每个视频文件调用 ProcessVideo(),一次处理一个视频文件。Finally, the video files are processed one at a time by calling ProcessVideo() for each.

private static async Task ProcessVideo(string videoPath)
{
    var watch = System.Diagnostics.Stopwatch.StartNew();
    Console.ForegroundColor = ConsoleColor.White;
    Console.WriteLine("\nVideo compression process started...");

    var compressedVideoPath = amsComponent.CompressVideo(videoPath);
    if (string.IsNullOrWhiteSpace(compressedVideoPath))
    {
        Console.ForegroundColor = ConsoleColor.Red;
        Console.WriteLine("Video Compression failed.");
    }

    Console.WriteLine("\nVideo compression process completed...");

    UploadVideoStreamRequest uploadVideoStreamRequest = CreateVideoStreamingRequest(compressedVideoPath);
    UploadAssetResult uploadResult = new UploadAssetResult();

    if (generateVtt)
    {
        uploadResult.GenerateVTT = generateVtt;
    }
    Console.WriteLine("\nVideo moderation process started...");

    if (!videoModerator.CreateAzureMediaServicesJobToModerateVideo(uploadVideoStreamRequest, uploadResult))
    {
        Console.ForegroundColor = ConsoleColor.Red;
        Console.WriteLine("\nVideo moderation process failed.");
    }

    Console.WriteLine("\nVideo moderation process completed...");
    Console.WriteLine("\nVideo review process started...");

    string reviewId = await videoReviewApi.CreateVideoReviewInContentModerator(uploadResult);

    watch.Stop();

    Console.WriteLine("\nVideo review successfully completed...");
    Console.WriteLine("\nTotal Elapsed Time: {0}", watch.Elapsed);
    Logger.Log("Video File Name: " + Path.GetFileName(videoPath));
    Logger.Log($"ReviewId: {reviewId}");
    Logger.Log($"Total Elapsed Time: {watch.Elapsed}");
}

ProcessVideo() 方法非常简单。The ProcessVideo() method is fairly straightforward. 它按顺序执行以下操作:It performs the following operations in the order:

  • 压缩视频Compresses the video
  • 将视频上传到 Azure 媒体服务资产Uploads the video to an Azure Media Services asset
  • 创建 AMS 作业以审查视频Creates an AMS job to moderate the video
  • 在内容审查器中创建视频评论Creates a video review in Content Moderator

以下部分更详细地考虑了 ProcessVideo() 调用的某些单独进程。The following sections consider in more detail some of the individual processes invoked by ProcessVideo().

压缩视频Compress the video

为了最大限度地减少网络流量,应用程序将视频文件转换为 H.264 (MPEG-4 AVC) 格式,并将其缩放到 640 像素的最大宽度。To minimize network traffic, the application converts video files to H.264 (MPEG-4 AVC) format and scales them to a maximum width of 640 pixels. 推荐使用 H.264 编解码器,因为其效率(压缩率)较高。The H.264 codec is recommended due to its high efficiency (compression rate). 压缩操作通过免费的 ffmpeg 命令行工具完成,该工具包含在 Visual Studio 解决方案的 Lib 文件夹中。The compression is done using the free ffmpeg command-line tool, which is included in the Lib folder of the Visual Studio solution. 输入文件可以是 ffmpeg 支持的任何格式,包括最常用的视频文件格式和编解码器。The input files may be of any format supported by ffmpeg, including most commonly used video file formats and codecs.

备注

使用命令行选项启动程序时,指定一个目录,其中包含要提交进行审查的视频文件。When you start the program using command-line options, you specify a directory containing the video files to be submitted for moderation. 系统会处理此目录中具有 .mp4 文件扩展名的所有文件。All files in this directory having the .mp4 filename extension are processed. 若要处理其他文件扩展名,请更新 Program.cs 中的 Main() 方法,以包含所需的扩展名。To process other filename extensions, update the Main() method in Program.cs to include the desired extensions.

用于压缩单个视频文件的代码是 AMSComponent.cs 中的 AmsComponent 类。The code that compresses a single video file is the AmsComponent class in AMSComponent.cs. 负责此功能的方法是 CompressVideo(),如下所示。The method responsible for this functionality is CompressVideo(), shown here.

public string CompressVideo(string videoPath)
{
    string ffmpegBlobUrl;
    if (!ValidatePreRequisites())
    {
        Console.WriteLine("Configurations check failed. Please cross check the configurations!");
        throw new Exception();
    }

    if (File.Exists(_configObj.FfmpegExecutablePath))
    {
        ffmpegBlobUrl = this._configObj.FfmpegExecutablePath;
    }
    else
    {
        Console.WriteLine("ffmpeg.exe is missing. Please check the Lib folder");
        throw new Exception();
    }

    string videoFilePathCom = videoPath.Split('.')[0] + "_c.mp4";
    ProcessStartInfo processStartInfo = new ProcessStartInfo();
    processStartInfo.WindowStyle = ProcessWindowStyle.Hidden;
    processStartInfo.FileName = ffmpegBlobUrl;
    processStartInfo.Arguments = "-i \"" + videoPath + "\" -vcodec libx264 -n -crf 32 -preset veryfast -vf scale=640:-1 -c:a aac -aq 1 -ac 2 -threads 0 \"" + videoFilePathCom + "\"";
    var process = Process.Start(processStartInfo);
    process.WaitForExit();
    process.Close();
    return videoFilePathCom;
}

此代码执行以下步骤:The code performs the following steps:

  • 检查以确保 App.config 中的配置包含所有必要数据Checks to make sure the configuration in App.config contains all necessary data
  • 检查以确保存在 ffmpeg 二进制文件Checks to make sure the ffmpeg binary is present
  • 通过将 _c.mp4 附加到文件的基本名称(例如 Example.mp4 -> Example_c.mp4)来生成输出文件名Builds the output filename by appending _c.mp4 to the base name of the file (such as Example.mp4 -> Example_c.mp4)
  • 生成命令行字符串以执行转换Builds a command-line string to perform the conversion
  • 使用命令行启动 ffmpeg 进程Starts an ffmpeg process using the command line
  • 等待处理视频Waits for the video to be processed

备注

如果知道视频已使用 H.264 进行压缩且尺寸合适,则可以重写 CompressVideo() 以跳过压缩。If you know your videos are already compressed using H.264 and have appropriate dimensions, you can rewrite CompressVideo() to skip the compression.

该方法返回已压缩输出文件的文件名。The method returns the filename of the compressed output file.

上传和审查视频Upload and moderate the video

视频必须先存储在 Azure 媒体服务中,然后才能由内容审查服务进行处理。The video must be stored in Azure Media Services before it can be processed by the Content Moderation service. Program.cs 中的 Program 类有一个简短的方法 CreateVideoStreamingRequest(),该方法返回一个对象,表示用于上传视频的流式处理请求。The Program class in Program.cs has a short method CreateVideoStreamingRequest() that returns an object representing the streaming request used to upload the video.

private static UploadVideoStreamRequest CreateVideoStreamingRequest(string compressedVideoFilePath)
{
    return
        new UploadVideoStreamRequest
        {
            VideoStream = File.ReadAllBytes(compressedVideoFilePath),
            VideoName = Path.GetFileName(compressedVideoFilePath),
            EncodingRequest = new EncodingRequest()
            {
                EncodingBitrate = AmsEncoding.AdaptiveStreaming
            },
            VideoFilePath = compressedVideoFilePath
        };
}

生成的 UploadVideoStreamRequest 对象在 UploadVideoStreamRequest.cs 中定义(其父级 UploadVideoRequestUploadVideoRequest.cs 中定义)。The resulting UploadVideoStreamRequest object is defined in UploadVideoStreamRequest.cs (and its parent, UploadVideoRequest, in UploadVideoRequest.cs). 此处未显示这些类;它们很短,仅用于保存压缩的视频数据及其相关信息。These classes aren't shown here; they're short and serve only to hold the compressed video data and information about it. 另一个仅包含数据的类 UploadAssetResult (UploadAssetResult.cs) 用于保存上传过程的结果。Another data-only class, UploadAssetResult (UploadAssetResult.cs) is used to hold the results of the upload process. 现在可以理解 ProcessVideo() 中的这些行了:Now it's possible to understand these lines in ProcessVideo():

UploadVideoStreamRequest uploadVideoStreamRequest = CreateVideoStreamingRequest(compressedVideoPath);
UploadAssetResult uploadResult = new UploadAssetResult();

if (generateVtt)
{
    uploadResult.GenerateVTT = generateVtt;
}
Console.WriteLine("\nVideo moderation process started...");

if (!videoModerator.CreateAzureMediaServicesJobToModerateVideo(uploadVideoStreamRequest, uploadResult))
{
    Console.ForegroundColor = ConsoleColor.Red;
    Console.WriteLine("\nVideo moderation process failed.");
}

这些行执行以下任务:These lines perform the following tasks:

  • 创建 UploadVideoStreamRequest 以上传压缩的视频Create a UploadVideoStreamRequest to upload the compressed video
  • 如果用户请求了文本脚本,则设置请求的 GenerateVTT 标志Set the request's GenerateVTT flag if the user has requested a text transcript
  • 调用 CreateAzureMediaServicesJobToModerateVideo() 以执行上传并接收结果Calls CreateAzureMediaServicesJobToModerateVideo() to perform the upload and receive the result

检查视频审查代码Examine video moderation code

CreateAzureMediaServicesJobToModerateVideo() 方法位于 VideoModerator.cs 中,它包含与 Azure 媒体服务交互的大量代码。The method CreateAzureMediaServicesJobToModerateVideo() is in VideoModerator.cs, which contains the bulk of the code that interacts with Azure Media Services. 该方法的源代码显示在以下摘录中。The method's source code is shown in the following extract.

public bool CreateAzureMediaServicesJobToModerateVideo(UploadVideoStreamRequest uploadVideoRequest, UploadAssetResult uploadResult)
{
    asset = CreateAsset(uploadVideoRequest);
    uploadResult.VideoName = uploadVideoRequest.VideoName;
    // Encoding the asset , Moderating the asset, Generating transcript in parallel
    IAsset encodedAsset = null;
    //Creates the job for the tasks.
    IJob job = this._mediaContext.Jobs.Create("AMS Review Job");

    //Adding encoding task to job.
    ConfigureEncodeAssetTask(uploadVideoRequest.EncodingRequest, job);

    ConfigureContentModerationTask(job);

    //adding transcript task to job.
    if (uploadResult.GenerateVTT)
    {
        ConfigureTranscriptTask(job);
    }

    var watch = System.Diagnostics.Stopwatch.StartNew();
    //submit and execute job.
    job.Submit();
    job.GetExecutionProgressTask(new CancellationTokenSource().Token).Wait();
    watch.Stop();
    Logger.Log($"AMS Job Elapsed Time: {watch.Elapsed}");

    if (job.State == JobState.Error)
    {
        throw new Exception("Video moderation has failed due to AMS Job error.");
    }

    UploadAssetResult result = uploadResult;
    encodedAsset = job.OutputMediaAssets[0];
    result.ModeratedJson = GetCmDetail(job.OutputMediaAssets[1]);
    // Check for valid Moderated JSON
    var jsonModerateObject = JsonConvert.DeserializeObject<VideoModerationResult>(result.ModeratedJson);

    if (jsonModerateObject == null)
    {
        return false;
    }
    if (uploadResult.GenerateVTT)
    {
        GenerateTranscript(job.OutputMediaAssets.Last());
    }

    uploadResult.StreamingUrlDetails = PublishAsset(encodedAsset);
    string downloadUrl = GenerateDownloadUrl(asset, uploadVideoRequest.VideoName);
    uploadResult.StreamingUrlDetails.DownloadUri = downloadUrl;
    uploadResult.VideoName = uploadVideoRequest.VideoName;
    uploadResult.VideoFilePath = uploadVideoRequest.VideoFilePath;
    return true;
}

此代码执行以下任务:This code performs the following tasks:

  • 创建 AMS 作业以完成处理Creates an AMS job for the processing to be done
  • 添加用于对视频文件进行编码、审查以及生成文本脚本的任务Adds tasks for encoding the video file, moderating it, and generating a text transcript
  • 提交作业、上传文件并开始进行处理Submits the job, uploading the file and beginning processing
  • 检索审查结果、文本脚本(如果需要)和其他信息Retrieves the moderation results, the text transcript (if requested), and other information

视频审查输出示例Sample video moderation output

视频审查作业的结果(请参阅视频审查快速入门)是一种包含审查结果的 JSON 数据结构。The result of the video moderation job (See video moderation quickstart is a JSON data structure containing the moderation results. 这些结果包括细分的视频片段(镜头),每个片段均包含一些事件(剪辑),这些事件具有标记为待评论的关键帧。These results include a breakdown of the fragments (shots) within the video, each containing events (clips) with key frames that have been flagged for review. 系统根据每个关键帧包含成人内容或不雅内容的可能性对其进行评分。Each key frame is scored by the likelihood that it contains adult or racy content. 以下示例展示了 JSON 响应:The following example shows a JSON response:

{
    "version": 2,
    "timescale": 90000,
    "offset": 0,
    "framerate": 50,
    "width": 1280,
    "height": 720,
    "totalDuration": 18696321,
    "fragments": [
    {
        "start": 0,
        "duration": 18000
    },
    {
        "start": 18000,
        "duration": 3600,
        "interval": 3600,
        "events": [
        [
        {
            "reviewRecommended": false,
            "adultScore": 0.00001,
            "racyScore": 0.03077,
            "index": 5,
            "timestamp": 18000,
            "shotIndex": 0
        }
        ]
    ]
    },
    {
        "start": 18386372,
        "duration": 119149,
        "interval": 119149,
        "events": [
        [
        {
            "reviewRecommended": true,
            "adultScore": 0.00000,
            "racyScore": 0.91902,
            "index": 5085,
            "timestamp": 18386372,
            "shotIndex": 62
        }
    ]
    ]
    }
]
}

当设置 GenerateVTT 标志时,还会生成来自视频的音频脚本。A transcription of the audio from the video is also produced when the GenerateVTT flag is set.

备注

控制台应用程序使用 Azure Media Indexer API 根据上传视频的音频轨道生成脚本。结果以 WebVTT 格式提供。The console application uses the Azure Media Indexer API to generate transcripts from the uploaded video's audio track. The results are provided in WebVTT format. 有关此格式的详细信息,请参阅 Web Video Text Tracks Format(Web 视频文本轨道格式)。For more information on this format, see Web Video Text Tracks Format.

创建人工评审Create a human review

审查过程会从视频返回一个关键帧列表,以及其音频轨道的脚本。The moderation process returns a list of key frames from the video, along with a transcript of its audio tracks. 下一步是在内容审查器评审工具中为人工审查器创建评审。The next step is to create a review in the Content Moderator Review tool for human moderators. 回到 Program.cs 中的 ProcessVideo() 方法,你会看到对 CreateVideoReviewInContentModerator() 方法的调用。Going back to the ProcessVideo() method in Program.cs, you see the call to the CreateVideoReviewInContentModerator() method. 此方法位于 videoReviewApi 类中,该类位于 VideoReviewAPI.cs 中,如下所示。This method is in the videoReviewApi class, which is in VideoReviewAPI.cs, and is shown here.

public async Task<string> CreateVideoReviewInContentModerator(UploadAssetResult uploadAssetResult)
{
    string reviewId = string.Empty;
    List<ProcessedFrameDetails> frameEntityList = framegenerator.CreateVideoFrames(uploadAssetResult);
    string path = uploadAssetResult.GenerateVTT == true ? this._amsConfig.FfmpegFramesOutputPath + Path.GetFileNameWithoutExtension(uploadAssetResult.VideoName) + "_aud_SpReco.vtt" : "";
    TranscriptScreenTextResult screenTextResult = new TranscriptScreenTextResult();
    if (File.Exists(path))
    {
        screenTextResult = await GenerateTextScreenProfanity(reviewId, path, frameEntityList);
        uploadAssetResult.Category1TextScore = screenTextResult.Category1Score;
        uploadAssetResult.Category2TextScore = screenTextResult.Category2Score;
        uploadAssetResult.Category3TextScore = screenTextResult.Category3Score;
        uploadAssetResult.Category1TextTag = screenTextResult.Category1Tag;
        uploadAssetResult.Category2TextTag = screenTextResult.Category2Tag;
        uploadAssetResult.Category3TextTag = screenTextResult.Category3Tag;
    }
    var reviewVideoRequestJson = CreateReviewRequestObject(uploadAssetResult, frameEntityList);
    if (string.IsNullOrWhiteSpace(reviewVideoRequestJson))
    {
        throw new Exception("Video review process failed in CreateVideoReviewInContentModerator");
    }
    var reviewIds = await ExecuteCreateReviewApi(reviewVideoRequestJson);
    reviewId = reviewIds.FirstOrDefault();
    frameEntityList = framegenerator.GenerateFrameImages(frameEntityList, uploadAssetResult, reviewId);
    await CreateAndPublishReviewInContentModerator(uploadAssetResult, frameEntityList, reviewId, path, screenTextResult);

    return reviewId;
}

CreateVideoReviewInContentModerator() 调用其他数种方法来执行以下任务:CreateVideoReviewInContentModerator() calls several other methods to perform the following tasks:

备注

控制台应用程序使用 FFmpeg 库生成缩略图。The console application uses the FFmpeg library for generating thumbnails. 这些缩略图(图像)对应于视频审查输出中的帧时间戳。These thumbnails (images) correspond to the frame timestamps in the video moderation output.

任务Task 方法Methods 文件File
从视频中提取关键帧并创建它们的缩略图Extract the key frames from the video and creates thumbnail images of them CreateVideoFrames()
GenerateFrameImages()
FrameGeneratorServices.cs
扫描文本脚本(如果可行)以查找成人音频或不雅音频Scan the text transcript, if available, to locate adult or racy audio GenerateTextScreenProfanity() VideoReviewAPI.cs
准备并提交视频评论请求以供人工检查Prepare and submits a video review request for human inspection CreateReviewRequestObject()
ExecuteCreateReviewApi()
CreateAndPublishReviewInContentModerator()
VideoReviewAPI.cs

以下屏幕展示了前述步骤的结果。The following screen shows the results of the previous steps.

视频评论默认视图

处理脚本Process the transcript

到目前为止,本教程中提供的代码主要侧重于可视内容。Until now, the code presented in this tutorial has focused on the visual content. 评论语音内容是一个单独的可选过程,如前文所述,它使用根据音频生成的脚本。Review of speech content is a separate and optional process that, as mentioned, uses a transcript generated from the audio. 现在是时候看一下如何在评论过程中创建和使用文本脚本。It's time now to take a look at how text transcripts are created and used in the review process. 生成脚本的任务由 Azure Media Indexer 服务执行。The task of generating the transcript falls to the Azure Media Indexer service.

应用程序执行以下任务:The application performs the following tasks:

任务Task 方法Methods 文件File
确定是否要生成文本脚本Determine whether text transcripts are to be generated Main()
GetUserInputs()
Program.cs
如果是,则在审查过程中提交脚本作业If so, submit a transcription job as part of moderation ConfigureTranscriptTask() VideoModerator.cs
获取脚本的本地副本Get a local copy of the transcript GenerateTranscript() VideoModerator.cs
标记包含不当音频的视频帧Flag frames of the video that contain inappropriate audio GenerateTextScreenProfanity()
TextScreen()
VideoReviewAPI.cs
将结果添加到评论中Add the results to the review UploadScreenTextResult()
ExecuteAddTranscriptSupportFile()
VideoReviewAPI.cs

任务配置Task configuration

让我们直接提交脚本作业。Let's jump right into submitting the transcription job. CreateAzureMediaServicesJobToModerateVideo()(已介绍)调用 ConfigureTranscriptTask()CreateAzureMediaServicesJobToModerateVideo() (already described) calls ConfigureTranscriptTask().

private void ConfigureTranscriptTask(IJob job)
{
    string mediaProcessorName = _amsConfigurations.MediaIndexer2MediaProcessor;
    IMediaProcessor processor = _mediaContext.MediaProcessors.GetLatestMediaProcessorByName(mediaProcessorName);

    string configuration = File.ReadAllText(_amsConfigurations.MediaIndexerConfigurationJson);
    ITask task = job.Tasks.AddNew("AudioIndexing Task", processor, configuration, TaskOptions.None);
    task.InputAssets.Add(asset);
    task.OutputAssets.AddNew("AudioIndexing Output Asset", AssetCreationOptions.None);
}

从解决方案的 Lib 文件夹中的文件 MediaIndexerConfig.json 读取脚本任务的配置。The configuration for the transcript task is read from the file MediaIndexerConfig.json in the solution's Lib folder. 为配置文件和脚本过程的输出创建 AMS 资产。AMS assets are created for the configuration file and for the output of the transcription process. 当 AMS 作业运行时,此任务会根据视频文件的音频轨道创建文本脚本。When the AMS job runs, this task creates a text transcript from the video file's audio track.

备注

示例应用程序仅识别美国英语语音。The sample application recognizes speech in US English only.

脚本生成Transcript generation

脚本作为 AMS 资产发布。The transcript is published as an AMS asset. 为了扫描脚本以筛选令人反感的内容,应用程序将从 Azure 媒体服务下载资产。To scan the transcript for objectionable content, the application downloads the asset from Azure Media Services. CreateAzureMediaServicesJobToModerateVideo() 调用此处所示的 GenerateTranscript() 来检索文件。CreateAzureMediaServicesJobToModerateVideo() calls GenerateTranscript(), shown here, to retrieve the file.

public bool GenerateTranscript(IAsset asset)
{
    try
    {
        var outputFolder = this._amsConfigurations.FfmpegFramesOutputPath;
        IAsset outputAsset = asset;
        IAccessPolicy policy = null;
        ILocator locator = null;
        policy = _mediaContext.AccessPolicies.Create("My 30 days readonly policy", TimeSpan.FromDays(360), AccessPermissions.Read);
        locator = _mediaContext.Locators.CreateLocator(LocatorType.Sas, outputAsset, policy, DateTime.UtcNow.AddMinutes(-5));
        DownloadAssetToLocal(outputAsset, outputFolder);
        locator.Delete();
        return true;
    }
    catch
    {   //TODO:  Logging
        Console.WriteLine("Exception occured while generating index for video.");
        throw;
    }
}

在进行一些必要的 AMS 设置之后,通过调用 DownloadAssetToLocal()(一种将 AMS 资产复制到本地文件的泛型函数)来执行实际下载。After some necessary AMS setup, the actual download is performed by calling DownloadAssetToLocal(), a generic function that copies an AMS asset to a local file.

审查脚本Moderate the transcript

获得脚本后,对其进行扫描并用于评论中。With the transcript close at hand, it is scanned and used in the review. 创建评论属于 CreateVideoReviewInContentModerator() 的范畴,它调用 GenerateTextScreenProfanity() 来执行该工作。Creating the review is the purview of CreateVideoReviewInContentModerator(), that calls GenerateTextScreenProfanity() to do the job. 此方法转而调用包含大部分功能的 TextScreen()In turn, this method calls TextScreen(), that contains most of the functionality.

TextScreen() 执行以下任务:TextScreen() performs the following tasks:

  • 分析脚本的时间戳和字幕Parse the transcript for time tamps and captions
  • 提交每个字幕以进行文本审查Submit each caption for text moderation
  • 标记任何可能包含令人反感的语音内容的帧Flag any frames that may have objectionable speech content

让我们更详细地研究一下这些任务:Let's examine each these tasks in more detail:

初始化代码Initialize the code

首先,初始化所有变量和集合。First, initialize all variables and collections.

private async Task<TranscriptScreenTextResult> TextScreen(string filepath, List<ProcessedFrameDetails> frameEntityList)
{
    List<TranscriptProfanity> profanityList = new List<TranscriptProfanity>();
    bool category1Tag = false;
    bool category2Tag = false;
    bool category3Tag = false;
    double category1Score = 0;
    double category2Score = 0;
    double category3Score = 0;
    List<string> vttLines = File.ReadAllLines(filepath).Where(line => !line.Contains("NOTE Confidence:") && line.Length > 0).ToList();
    StringBuilder sb = new StringBuilder();
    List<CaptionScreentextResult> csrList = new List<CaptionScreentextResult>();
    CaptionScreentextResult captionScreentextResult = new CaptionScreentextResult() { Captions = new List<string>() };

分析脚本字幕Parse the transcript for captions

接下来,分析 VTT 格式的脚本的字幕和时间戳。Next, parse the VTT formatted transcript for captions and timestamps. 评审工具会在视频评审屏幕的“脚本”选项卡中显示这些字幕。The Review tool displays these captions in the Transcript Tab on the video review screen. 时间戳用于将字幕与相应的视频帧同步。The timestamps are used to sync the captions with the corresponding video frames.

foreach (var line in vttLines.Skip(1))
{
    if (line.Contains("-->"))
    {
        if (sb.Length > 0)
        {
            captionScreentextResult.Captions.Add(sb.ToString());
            sb.Clear();
        }
        if (captionScreentextResult.Captions.Count > 0)
        {
            csrList.Add(captionScreentextResult);
            captionScreentextResult = new CaptionScreentextResult() { Captions = new List<string>() };
        }
        string[] times = line.Split(new string[] { "-->" }, StringSplitOptions.RemoveEmptyEntries);
        string startTimeString = times[0].Trim();
        string endTimeString = times[1].Trim();
        int startTime = (int)TimeSpan.ParseExact(startTimeString, @"hh\:mm\:ss\.fff", CultureInfo.InvariantCulture).TotalMilliseconds;
        int endTime = (int)TimeSpan.ParseExact(endTimeString, @"hh\:mm\:ss\.fff", CultureInfo.InvariantCulture).TotalMilliseconds;
        captionScreentextResult.StartTime = startTime;
        captionScreentextResult.EndTime = endTime;
    }
    else
    {
        sb.Append(line);
    }
    if (sb.Length + line.Length > 1024)
    {
        captionScreentextResult.Captions.Add(sb.ToString());
        sb.Clear();
    }
}
if (sb.Length > 0)
{
    captionScreentextResult.Captions.Add(sb.ToString());
}
if (captionScreentextResult.Captions.Count > 0)
{
    csrList.Add(captionScreentextResult);
}

使用文本审查服务审查字幕Moderate captions with the text moderation service

接下来,我们使用内容审查器的文本 API 扫描已分析的文本字幕。Next, we scan the parsed text captions with Content Moderator's text API.

备注

内容审查器服务密钥有一个每秒请求数 (RPS) 速率限制。Your Content Moderator service key has a requests per second (RPS) rate limit. 如果超过该限制,SDK 会引发错误代码为 429 的异常。If you exceed the limit, the SDK throws an exception with a 429 error code.

免费层密钥有一个 RPS 速率限制。A free tier key has a one RPS rate limit.

int waitTime = 1000;
    foreach (var csr in csrList)
    {
        bool captionAdultTextTag = false;
        bool captionRacyTextTag = false;
        bool captionOffensiveTextTag = false;
        Screen screenRes = new Screen();
        bool retry = true;

        foreach (var caption in csr.Captions)
        {
            while (retry)
            {
                try
                {
                    System.Threading.Thread.Sleep(waitTime);
                    var lang = await CMClient.TextModeration.DetectLanguageAsync("text/plain", caption);
                    var res = await CMClient.TextModeration.ScreenTextWithHttpMessagesAsync(lang.DetectedLanguageProperty, caption, string.Empty, null, null, null, true);
                    screenRes = res.Body;
                    retry = false;
                }
                catch (Exception e)
                {
                    if (e.Message.Contains("429"))
                    {
                        Console.WriteLine($"Moderation API call failed. Message: {e.Message}");
                        waitTime = (int)(waitTime * 1.5);
                        Console.WriteLine($"wait time: {waitTime}");
                    }
                    else
                    {
                        retry = false;
                        Console.WriteLine($"Moderation API call failed. Message: {e.Message}");
                    }
                }
            }
             
            if (screenRes != null)
            {
                TranscriptProfanity transcriptProfanity = new TranscriptProfanity();
                transcriptProfanity.TimeStamp = "";
                List<Terms> transcriptTerm = new List<Terms>();
                if (screenRes.Terms != null)
                {
                    foreach (var term in screenRes.Terms)
                    {
                        var profanityobject = new Terms
                        {
                            Term = term.Term,
                            Index = term.Index.Value
                        };
                        transcriptTerm.Add(profanityobject);
                    }
                    transcriptProfanity.Terms = transcriptTerm;
                    profanityList.Add(transcriptProfanity);
                }
                if (screenRes.Classification.Category1.Score.Value > _amsConfig.Category1TextThreshold) captionAdultTextTag = true;
                if (screenRes.Classification.Category2.Score.Value > _amsConfig.Category2TextThreshold) captionRacyTextTag = true;
                if (screenRes.Classification.Category3.Score.Value > _amsConfig.Category3TextThreshold) captionOffensiveTextTag = true;
                if (screenRes.Classification.Category1.Score.Value > _amsConfig.Category1TextThreshold) category1Tag = true;
                if (screenRes.Classification.Category2.Score.Value > _amsConfig.Category2TextThreshold) category2Tag = true;
                if (screenRes.Classification.Category3.Score.Value > _amsConfig.Category3TextThreshold) category3Tag = true;
                category1Score = screenRes.Classification.Category1.Score.Value > category1Score ? screenRes.Classification.Category1.Score.Value : category1Score;
                category2Score = screenRes.Classification.Category2.Score.Value > category2Score ? screenRes.Classification.Category2.Score.Value : category2Score;
                category3Score = screenRes.Classification.Category3.Score.Value > category3Score ? screenRes.Classification.Category3.Score.Value : category3Score;
            }
            foreach (var frame in frameEntityList.Where(x => x.TimeStamp >= csr.StartTime && x.TimeStamp <= csr.EndTime))
            {
                frame.IsAdultTextContent = captionAdultTextTag;
                frame.IsRacyTextContent = captionRacyTextTag;
                frame.IsOffensiveTextContent = captionOffensiveTextTag;
            }
        }
    }
    TranscriptScreenTextResult screenTextResult = new TranscriptScreenTextResult()
    {
        TranscriptProfanity = profanityList,
        Category1Tag = category1Tag,
        Category2Tag = category2Tag,
        Category3Tag = category3Tag,
        Category1Score = category1Score,
        Category2Score = category2Score,
        Category3Score = category3Score
    };
    return screenTextResult;
}

文本审查细分Text moderation breakdown

TextScreen() 是一种很复杂的方法,因此,让我们逐一分解该方法。TextScreen() is a substantial method, so let's break it down.

  1. 首先,该方法逐行读取脚本文件。First, the method reads the transcript file line by line. 它忽略空白行以及包含具有置信度分数的 NOTE 的行。It ignores blank lines and lines containing a NOTE with a confidence score. 它从文件的提示中提取时间戳和文本项。It extracts the time stamps and text items from the cues in the file. 提示表示音频轨道中的文本,包括开始时间和结束时间。A cue represents text from the audio track and includes start and end times. 提示以带有字符串 --> 的时间戳行开头。A cue begins with the time stamp line with the string -->. 后面跟有一行或多行文本。It is followed by one or more lines of text.

  2. CaptionScreentextResult(在 TranscriptProfanity.cs 中定义)的实例用于保存从每个提示分析得到的信息。Instances of CaptionScreentextResult (defined in TranscriptProfanity.cs) are used to hold the information parsed from each cue. 当检测到新的时间戳行,或者达到 1024 个字符的最大文本长度时,会向 csrList 添加新的 CaptionScreentextResultWhen a new time stamp line is detected, or a maximum text length of 1024 characters is reached, a new CaptionScreentextResult is added to the csrList.

  3. 接下来,该方法将各提示提交给文本审查 API。The method next submits each cue to the Text Moderation API. 它调用 Microsoft.Azure.CognitiveServices.ContentModerator 程序集中定义的 ContentModeratorClient.TextModeration.DetectLanguageAsync()ContentModeratorClient.TextModeration.ScreenTextWithHttpMessagesAsync()It calls both ContentModeratorClient.TextModeration.DetectLanguageAsync() and ContentModeratorClient.TextModeration.ScreenTextWithHttpMessagesAsync(), which are defined in the Microsoft.Azure.CognitiveServices.ContentModerator assembly. 为了避免速率限制,该方法在提交每个提示之前暂停一秒钟。To avoid being rate-limited, the method pauses for a second before submitting each cue.

  4. 在从文本审查服务收到结果之后,该方法对结果进行分析,看它们是否满足置信度阈值。After receiving results from the Text Moderation service, the method then analyzes them to see whether they meet confidence thresholds. 这些值作为 OffensiveTextThresholdRacyTextThresholdAdultTextThreshold 存储在 App.config 中。These values are established in App.config as OffensiveTextThreshold, RacyTextThreshold, and AdultTextThreshold. 最后,还会存储令人反感的词语本身。Finally, the objectionable terms themselves are also stored. 提示的时间范围内的所有帧均标记为包含冒犯性、不雅和/或成人文本。All frames within the cue's time range are flagged as containing offensive, racy, and/or adult text.

  5. TextScreen() 返回一个 TranscriptScreenTextResult 实例,其中包含整个视频的文本审查结果。TextScreen() returns a TranscriptScreenTextResult instance that contains the text moderation result from the video as a whole. 该对象包含各种令人反感的内容的标志和分数,以及所有令人反感的词语的列表。This object includes flags and scores for the various types of objectionable content, along with a list of all objectionable terms. 调用方 CreateVideoReviewInContentModerator() 通过调用 UploadScreenTextResult() 将此信息附加到评论中,以供评论人员使用。The caller, CreateVideoReviewInContentModerator(), calls UploadScreenTextResult() to attach this information to the review so it is available to human reviewers.

以下屏幕展示了脚本文生成和审查步骤的结果。The following screen shows the result of the transcript generation and moderation steps.

视频审查脚本视图

程序输出Program output

该程序的以下命令行输出展示了已完成的各种任务。The following command-line output from the program shows the various tasks as they are completed. 此外,审查结果(采用 JSON 格式)和语音脚本位于原始视频文件所在的目录中。Additionally, the moderation result (in JSON format) and the speech transcript are available in the same directory as the original video files.

Microsoft.ContentModerator.AMSComponentClient
Enter the fully qualified local path for Uploading the video :
"Your File Name.MP4"
Generate Video Transcript? [y/n] : y

Video compression process started...
Video compression process completed...

Video moderation process started...
Video moderation process completed...

Video review process started...
Video Frames Creation inprogress...
Frames(83) created successfully.
Review Created Successfully and the review Id 201801va8ec2108d6e043229ba7a9e6373edec5
Video review successfully completed...

Total Elapsed Time: 00:05:56.8420355

后续步骤Next steps

在本教程中,你设置了一个应用程序来审查视频内容(包括脚本内容)并在评审工具中创建评审。In this tutorial, you set up an application that moderates video content—including transcript content—and creates reviews in the Review tool. 接下来,请了解有关视频审查的详细信息。Next, learn more about the details of video moderation.