快速入门:识别存储在 Blob 存储中的语音Quickstart: Recognize speech stored in blob storage

重要

需要语音 SDK 1.11.0 或更高版本。Speech SDK version 1.11.0 or later is required.

在本快速入门中,你将使用 REST API 在批处理中识别文件中的语音。In this quickstart, you will use a REST API to recognize speech from files in a batch process. 批处理执行语音听录,无需任何用户交互。A batch process executes the speech transcription without any user interactions. 它提供了一个简单的编程模型,无需管理并发、自定义语音识别模型或其他详细信息。It gives you a simple programming model, without the need to manage concurrency, custom speech models, or other details. 它需要高级控制选项,同时可以有效利用 Azure 语音服务资源。It entails advanced control options, while making efficient use of Azure speech service resources.

若要深入了解可用选项和配置详细信息,请参阅批量听录For more information on the available options and configuration details, see batch transcription.

以下快速入门将指导你完成使用示例。The following quickstart will walk you through a usage sample.

如果希望直入正题,请在 GitHub 上查看或下载所有语音 SDK C# 示例If you prefer to jump right in, view or download all Speech SDK C# Samples on GitHub. 否则就开始吧!Otherwise, let's get started.

先决条件Prerequisites

在开始之前,请务必:Before you get started, make sure to:

在 Visual Studio 中打开项目Open your project in Visual Studio

第一步是确保在 Visual Studio 中打开项目。The first step is to make sure that you have your project open in Visual Studio.

  1. 启动 Visual Studio 2019。Launch Visual Studio 2019.
  2. 加载项目并打开 Program.csLoad your project and open Program.cs.

添加对 Newtonsoft.Json 的引用Add a reference to Newtonsoft.Json

  1. 在解决方案资源管理器中右键单击“helloworld”项目,然后选择“管理 NuGet 包”显示 NuGet 包管理器。 In the Solution Explorer, right-click the helloworld project, and then select Manage NuGet Packages to show the NuGet Package Manager.
  2. 在右上角找到“包源”下拉框,确保选择了 nuget.orgIn the upper-right corner, find the Package Source drop-down box, and make sure that nuget.org is selected.
  3. 在左上角,选择“浏览”。In the upper-left corner, select Browse.
  4. 在搜索框中键入“Newtonsoft.Json”并选择“输入”。In the search box, type newtonsoft.json and select Enter.
  5. 从搜索结果中选择 Newtonsoft.Json 包,然后选择“安装”以安装最新稳定版本。From the search results, select the Newtonsoft.Json package, and then select Install to install the latest stable version.
  6. 接受所有协议和许可证,开始安装。Accept all agreements and licenses to start the installation. 安装此包后,“包管理器控制台”窗口中将显示一条确认消息。After the package is installed, a confirmation appears in the Package Manager Console window.

从一些样本代码入手Start with some boilerplate code

添加一些代码作为项目的框架。Let's add some code that works as a skeleton for our project.

class Program
{
    // Replace with your subscription key
    const string SubscriptionKey = "YourSubscriptionKey";

    // Update with your service region
    const string Region = "YourServiceRegion";
    const int Port = 443;
 
    // Recordings and locale
    const string Locale = "en-US";
    const string RecordingsBlobUri = "YourFileUrl";
 
    // Name and description
    const string Name = "Simple transcription";
    const string Description = "Simple transcription description";
 
    const string SpeechToTextBasePath = "api/speechtotext/v2.0/";
 
    static async Task Main()
    {
        // Cognitive Services follows security best practices.
        // If you experience connectivity issues, see:
        // https://docs.microsoft.com/dotnet/framework/network-programming/tls
 
        await TranscribeAsync();
    }
 
    static async Task TranscribeAsync()
    {
        Console.WriteLine("Starting transcriptions client...");
    }
}

需要替换以下值:You'll need to replace the following values:

  • YourSubscriptionKey:在语音资源的 Azure 门户的“密钥” 页中找到YourSubscriptionKey: found in the the Keys page of the Azure portal for the Speech resource
  • YourServiceRegion:在语音资源的 Azure 门户的“概述” 页中找到YourServiceRegion: found in the the Overview page of the Azure portal for the Speech resource
  • YourFileUrl:在存储帐户资源的 Azure 门户的“Blob 服务/容器” 页下找到YourFileUrl: found in under the Blob service / Containers page of the Azure portal for the Storage account resource
    • 选择适当的容器Select the appropriate container
    • 选择所需的 blobSelect the desired blob
    • 在“属性” 页下复制“URL” Copy the URL under the Properties page

JSON 包装器JSON Wrappers

由于 REST API 采用 JSON 格式的请求并且也返回 JSON 结果,因此我们可以仅使用字符串与之进行交互,但不建议这样做。As the REST API's take requests in JSON format and also return results in JSON we could interact with them using only strings, but that's not recommended. 为了使请求和响应更易于管理,我们将声明一些用于对 JSON 进行序列化/反序列化处理的类。In order to make the requests and responses easier to manage, we'll declare a few classes to use for serializing / deserializing the JSON.

开始操作并在 TranscribeAsync 之后放置声明。Go ahead and put their declarations after TranscribeAsync.

public class ModelIdentity
{
    ModelIdentity(Guid id) => Id = id;

    public Guid Id { get; private set; }

    public static ModelIdentity Create(Guid Id) => new ModelIdentity(Id);
}

public class Transcription
{
    [JsonConstructor]
    Transcription(
        Guid id,
        string name,
        string description,
        string locale,
        DateTime createdDateTime,
        DateTime lastActionDateTime,
        string status,
        Uri recordingsUrl,
        IReadOnlyDictionary<string, string> resultsUrls)
    {
        Id = id;
        Name = name;
        Description = description;
        CreatedDateTime = createdDateTime;
        LastActionDateTime = lastActionDateTime;
        Status = status;
        Locale = locale;
        RecordingsUrl = recordingsUrl;
        ResultsUrls = resultsUrls;
    }

    public string Name { get; set; }

    public string Description { get; set; }

    public string Locale { get; set; }

    public Uri RecordingsUrl { get; set; }

    public IReadOnlyDictionary<string, string> ResultsUrls { get; set; }

    public Guid Id { get; set; }

    public DateTime CreatedDateTime { get; set; }

    public DateTime LastActionDateTime { get; set; }

    public string Status { get; set; }

    public string StatusMessage { get; set; }
}

public class TranscriptionDefinition
{
    TranscriptionDefinition(
        string name,
        string description,
        string locale,
        Uri recordingsUrl,
        IEnumerable<ModelIdentity> models)
    {
        Name = name;
        Description = description;
        RecordingsUrl = recordingsUrl;
        Locale = locale;
        Models = models;
        Properties = new Dictionary<string, string>
        {
            ["PunctuationMode"] = "DictatedAndAutomatic",
            ["ProfanityFilterMode"] = "Masked",
            ["AddWordLevelTimestamps"] = "True"
        };
    }

    public string Name { get; set; }

    public string Description { get; set; }

    public Uri RecordingsUrl { get; set; }

    public string Locale { get; set; }

    public IEnumerable<ModelIdentity> Models { get; set; }

    public IDictionary<string, string> Properties { get; set; }

    public static TranscriptionDefinition Create(
        string name,
        string description,
        string locale,
        Uri recordingsUrl)
        => new TranscriptionDefinition(name, description, locale, recordingsUrl, new ModelIdentity[0]);
}

创建和配置 Http 客户端Create and configure an Http Client

首先,我们需要具有正确的基本 URL 和身份验证集的 Http 客户端。The first thing we'll need is an Http Client that has a correct base URL and authentication set. 将此代码插入 TranscribeAsyncInsert this code in TranscribeAsync.

var client = new HttpClient
{
    Timeout = TimeSpan.FromMinutes(25),
    BaseAddress = new UriBuilder(Uri.UriSchemeHttps, $"{Region}.cris.azure.cn", Port).Uri,
    DefaultRequestHeaders =
    {
        { "Ocp-Apim-Subscription-Key", SubscriptionKey }
    }
};

生成听录请求Generate a transcription request

接下来,我们将生成听录请求。Next, we'll generate the transcription request. 将此代码添加到 TranscribeAsyncAdd this code to TranscribeAsync.

var transcriptionDefinition =
    TranscriptionDefinition.Create(
        Name,
        Description,
        Locale,
        new Uri(RecordingsBlobUri));

var res = JsonConvert.SerializeObject(transcriptionDefinition);
var sc = new StringContent(res);
sc.Headers.ContentType = JsonMediaTypeFormatter.DefaultMediaType;

发送请求并查看其状态Send the request and check its status

现在我们将请求发布到语音服务并检查初始响应代码。Now we post the request to the Speech service and check the initial response code. 此响应代码将仅指示服务是否已收到请求。This response code will simply indicate if the service has received the request. 该服务将在响应标头中返回一个 URL,这是它将存储听录状态的位置。The service will return a Url in the response headers that's the location where it will store the transcription status.

Uri transcriptionLocation = null;
using (var response = await client.PostAsync($"{SpeechToTextBasePath}Transcriptions/", sc))
{
    if (!response.IsSuccessStatusCode)
    {
        Console.WriteLine("Error {0} starting transcription.", response.StatusCode);
        return;
    }

    transcriptionLocation = response.Headers.Location;
}

等待听录完成Wait for the transcription to complete

由于服务以异步方式处理听录,因此需要时常轮询其状态。Since the service processes the transcription asynchronously, we need to poll for its status every so often. 我们每 5 秒查看一次。We'll check every 5 seconds.

通过检索在发布请求时收到的 URL 中的内容,可以查看状态。We can check the status by retrieving the content at the Url we got when the posted the request. 内容返回后,我们将其反序列化为一个帮助程序类,使其便于交互。When we get the content back, we deserialize it into one of our helper class to make it easier to interact with.

下面是一个轮询代码,其中显示了除成功完成之外的所有状态,我们会在下一步完成该操作。Here's the polling code with status display for everything except a successful completion, we'll do that next.

Console.WriteLine($"Created transcription at location {transcriptionLocation}.");
Console.WriteLine("Checking status.");

var completed = false;

// Check for the status of our transcriptions periodically
while (!completed)
{
    Transcription transcription = null;
    using (var response = await client.GetAsync(transcriptionLocation.AbsolutePath))
    {
        var contentType = response.Content.Headers.ContentType;
        if (response.IsSuccessStatusCode &&
            string.Equals(contentType.MediaType, "application/json", StringComparison.OrdinalIgnoreCase))
        {
            transcription = await response.Content.ReadAsAsync<Transcription>();
        }
        else
        {
            Console.WriteLine("Error with status {0} getting transcription result", response.StatusCode);
            continue;
        }
    }

    switch (transcription.Status)
    {
        case "Failed":
            completed = true;
            Console.WriteLine("Transcription failed. Status: {0}", transcription.StatusMessage);
            break;

        case "Succeeded":
            break;

        case "Running":
            Console.WriteLine("Transcription is still running.");
            break;

        case "NotStarted":
            Console.WriteLine("Transcription has not started.");
            break;
    }

    await Task.Delay(TimeSpan.FromSeconds(5));
}

Console.WriteLine("Press any key...");
Console.ReadKey();

显示听录结果Display the transcription results

服务成功完成听录后,结果将存储在可从状态响应中获取的其他 URL 中。Once the service has successfully completed the transcription the results will be stored in another Url that we can get from the status response. 在此,我们先发出请求将这些结果下载到临时文件中,再进行读取和反序列化操作。Here we make a request to download those results in to a temporary file before reading and deserializing them. 加载结果后,可以将其打印到控制台。Once the results are loaded we can print them to the console. 将以下代码添加到 case "Succeeded": 标签。Add the following code to the case "Succeeded": label.

completed = true;
var webClient = new WebClient();
var filename = Path.GetTempFileName();
webClient.DownloadFile(transcription.ResultsUrls["channel_0"], filename);
var results = File.ReadAllText(filename);
Console.WriteLine($"Transcription succeeded. Results: {Environment.NewLine}{results}");
File.Delete(filename);

查看代码Check your code

此时,代码应如下所示:(我们已向此版本添加了一些注释)At this point, your code should look like this: (We've added some comments to this version)

using Newtonsoft.Json;
using System;
using System.Collections.Generic;
using System.IO;
using System.Net;
using System.Threading.Tasks;
using System.Net.Http;
using System.Net.Http.Formatting;

namespace BatchClient
{
    class Program
    {
        // Replace with your subscription key
        const string SubscriptionKey = "YourSubscriptionKey";

        // Update with your service region
        const string Region = "YourServiceRegion";
        const int Port = 443;

        // Recordings and locale
        const string Locale = "en-US";
        const string RecordingsBlobUri = "YourFileUrl";

        // Name and description
        const string Name = "Simple transcription";
        const string Description = "Simple transcription description";

        const string SpeechToTextBasePath = "api/speechtotext/v2.0/";

        static async Task Main()
        {
            // For non-Windows 10 users.
            ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;

            await TranscribeAsync();
        }

        static async Task TranscribeAsync()
        {
            Console.WriteLine("Starting transcriptions client...");

            // Create the client object and authenticate
            var client = new HttpClient
            {
                Timeout = TimeSpan.FromMinutes(25),
                BaseAddress = new UriBuilder(Uri.UriSchemeHttps, $"{Region}.cris.azure.cn", Port).Uri,
                DefaultRequestHeaders =
                {
                    { "Ocp-Apim-Subscription-Key", SubscriptionKey }
                }
            };

            var transcriptionDefinition =
                TranscriptionDefinition.Create(
                    Name,
                    Description,
                    Locale,
                    new Uri(RecordingsBlobUri));

            var res = JsonConvert.SerializeObject(transcriptionDefinition);
            var sc = new StringContent(res);
            sc.Headers.ContentType = JsonMediaTypeFormatter.DefaultMediaType;

            Uri transcriptionLocation = null;

            using (var response = await client.PostAsync($"{SpeechToTextBasePath}Transcriptions/", sc))
            {
                if (!response.IsSuccessStatusCode)
                {
                    Console.WriteLine("Error {0} starting transcription.", response.StatusCode);
                    return;
                }

                transcriptionLocation = response.Headers.Location;
            }

            Console.WriteLine($"Created transcription at location {transcriptionLocation}.");
            Console.WriteLine("Checking status.");

            var completed = false;

            // Check for the status of our transcriptions periodically
            while (!completed)
            {
                Transcription transcription = null;

                // Get all transcriptions for the user
                using (var response = await client.GetAsync(transcriptionLocation.AbsolutePath))
                {
                    var contentType = response.Content.Headers.ContentType;
                    if (response.IsSuccessStatusCode &&
                        string.Equals(contentType.MediaType, "application/json", StringComparison.OrdinalIgnoreCase))
                    {
                        transcription = await response.Content.ReadAsAsync<Transcription>();
                    }
                    else
                    {
                        Console.WriteLine("Error with status {0} getting transcription result", response.StatusCode);
                        continue;
                    }
                }

                // For each transcription in the list we check the status
                switch (transcription.Status)
                {
                    case "Failed":
                        completed = true;
                        Console.WriteLine("Transcription failed. Status: {0}", transcription.StatusMessage);
                        break;

                    case "Succeeded":
                        completed = true;
                        var webClient = new WebClient();
                        var filename = Path.GetTempFileName();
                        webClient.DownloadFile(transcription.ResultsUrls["channel_0"], filename);
                        var results = File.ReadAllText(filename);
                        Console.WriteLine($"Transcription succeeded. Results: {Environment.NewLine}{results}");
                        File.Delete(filename);
                        break;

                    case "Running":
                        Console.WriteLine("Transcription is still running.");
                        break;

                    case "NotStarted":
                        Console.WriteLine("Transcription has not started.");
                        break;
                }

                await Task.Delay(TimeSpan.FromSeconds(5));
            }

            Console.WriteLine("Press any key...");
            Console.ReadKey();
        }
    }

    public class ModelIdentity
    {
        ModelIdentity(Guid id) => Id = id;

        public Guid Id { get; private set; }

        public static ModelIdentity Create(Guid Id) => new ModelIdentity(Id);
    }

    public class Transcription
    {
        [JsonConstructor]
        Transcription(
            Guid id,
            string name,
            string description,
            string locale,
            DateTime createdDateTime,
            DateTime lastActionDateTime,
            string status,
            Uri recordingsUrl,
            IReadOnlyDictionary<string, string> resultsUrls)
        {
            Id = id;
            Name = name;
            Description = description;
            CreatedDateTime = createdDateTime;
            LastActionDateTime = lastActionDateTime;
            Status = status;
            Locale = locale;
            RecordingsUrl = recordingsUrl;
            ResultsUrls = resultsUrls;
        }

        public string Name { get; set; }

        public string Description { get; set; }

        public string Locale { get; set; }

        public Uri RecordingsUrl { get; set; }

        public IReadOnlyDictionary<string, string> ResultsUrls { get; set; }

        public Guid Id { get; set; }

        public DateTime CreatedDateTime { get; set; }

        public DateTime LastActionDateTime { get; set; }

        public string Status { get; set; }

        public string StatusMessage { get; set; }
    }

    public class TranscriptionDefinition
    {
        TranscriptionDefinition(
            string name,
            string description,
            string locale,
            Uri recordingsUrl,
            IEnumerable<ModelIdentity> models)
        {
            Name = name;
            Description = description;
            RecordingsUrl = recordingsUrl;
            Locale = locale;
            Models = models;
            Properties = new Dictionary<string, string>
            {
                ["PunctuationMode"] = "DictatedAndAutomatic",
                ["ProfanityFilterMode"] = "Masked",
                ["AddWordLevelTimestamps"] = "True"
            };
        }

        public string Name { get; set; }

        public string Description { get; set; }

        public Uri RecordingsUrl { get; set; }

        public string Locale { get; set; }

        public IEnumerable<ModelIdentity> Models { get; set; }

        public IDictionary<string, string> Properties { get; set; }

        public static TranscriptionDefinition Create(
            string name,
            string description,
            string locale,
            Uri recordingsUrl)
            => new TranscriptionDefinition(name, description, locale, recordingsUrl, new ModelIdentity[0]);
    }
}

生成并运行应用Build and run your app

现在,可以使用语音服务构建应用并测试语音识别。Now you're ready to build your app and test our speech recognition using the Speech service.

  1. “编译代码”- 在 Visual Studio 菜单栏中,选择“生成” > “生成解决方案” 。Compile the code - From the menu bar of Visual Studio, choose Build > Build Solution.
  2. 启动应用 - 在菜单栏中,选择“调试” > “开始调试”,或按 F5 。Start your app - From the menu bar, choose Debug > Start Debugging or press F5.
  3. 开始识别 - 它将提示你说英语短语。Start recognition - It'll prompt you to speak a phrase in English. 语音将发送到语音服务,转录为文本,并在控制台中呈现。Your speech is sent to the Speech service, transcribed as text, and rendered in the console.

后续步骤Next steps

在本快速入门中,你将使用 REST API 在批处理中识别文件中的语音。In this quickstart, you will use a REST API to recognize speech from files in a batch process. 批处理执行语音听录,无需任何用户交互。A batch process executes the speech transcription without any user interactions. 它提供了一个简单的编程模型,无需管理并发、自定义语音识别模型或其他详细信息。It gives you a simple programming model, without the need to manage concurrency, custom speech models, or other details. 它需要高级控制选项,同时可以有效利用 Azure 语音服务资源。It entails advanced control options, while making efficient use of Azure speech service resources.

若要深入了解可用选项和配置详细信息,请参阅批量听录For more information on the available options and configuration details, see batch transcription.

以下快速入门将指导你完成使用示例。The following quickstart will walk you through a usage sample.

如果希望直入正题,请在 GitHub 上查看或下载所有语音 SDK C++ 示例If you prefer to jump right in, view or download all Speech SDK C++ Samples on GitHub. 否则就开始吧!Otherwise, let's get started.

先决条件Prerequisites

在开始之前,请务必:Before you get started, make sure to:

在 Visual Studio 中打开项目Open your project in Visual Studio

第一步是确保在 Visual Studio 中打开项目。The first step is to make sure that you have your project open in Visual Studio.

  1. 启动 Visual Studio 2019。Launch Visual Studio 2019.
  2. 加载项目并打开 helloworld.cppLoad your project and open helloworld.cpp.

添加参考Add a references

为了加速代码开发,我们将使用几个外部组件:To speed up our code development we'll be using a couple of external components:

  • CPP Rest SDK,用于对 REST 服务进行 REST 调用的客户端库。CPP Rest SDK A client library for making REST calls to a REST service.
  • nlohmann/json,便利的 JSON 分析/序列化/反序列化库。nlohmann/json Handy JSON Parsing / Serialization / Deserialization library.

两者均可使用 vcpkg 进行安装。Both can be installed using vcpkg.

vcpkg install cpprestsdk cpprestsdk:x64-windows
vcpkg install nlohmann-json

从一些样本代码入手Start with some boilerplate code

添加一些代码作为项目的框架。Let's add some code that works as a skeleton for our project.

#include <iostream>
#include <strstream>
#include <Windows.h>
#include <locale>
#include <codecvt>
#include <string>

#include <cpprest/http_client.h>
#include <cpprest/filestream.h>
#include <nlohmann/json.hpp>

using namespace std;
using namespace utility;                    // Common utilities like string conversions
using namespace web;                        // Common features like URIs.
using namespace web::http;                  // Common HTTP functionality
using namespace web::http::client;          // HTTP client features
using namespace concurrency::streams;       // Asynchronous streams
using json = nlohmann::json;

const string_t region = U("YourServiceRegion");
const string_t subscriptionKey = U("YourSubscriptionKey");
const string name = "Simple transcription";
const string description = "Simple transcription description";
const string myLocale = "en-US";
const string recordingsBlobUri = "YourFileUrl";

void recognizeSpeech()
{
    std::wstring_convert<std::codecvt_utf8_utf16<wchar_t >> converter;

}

int wmain()
{
    recognizeSpeech();
    cout << "Please press a key to continue.\n";
    cin.get();
    return 0;
}

需要替换以下值:You'll need to replace the following values:

  • YourSubscriptionKey:在语音资源的 Azure 门户的“密钥” 页中找到YourSubscriptionKey: found in the the Keys page of the Azure portal for the Speech resource
  • YourServiceRegion:在语音资源的 Azure 门户的“概述” 页中找到YourServiceRegion: found in the the Overview page of the Azure portal for the Speech resource
  • YourFileUrl:在存储帐户资源的 Azure 门户的“Blob 服务/容器” 页下找到YourFileUrl: found in under the Blob service / Containers page of the Azure portal for the Storage account resource
    • 选择适当的容器Select the appropriate container
    • 选择所需的 blobSelect the desired blob
    • 在“属性” 页下复制“URL” Copy the URL under the Properties page

JSON 包装器JSON Wrappers

由于 REST API 采用 JSON 格式的请求并且也返回 JSON 结果,因此我们可以仅使用字符串与之进行交互,但不建议这样做。As the REST API's take requests in JSON format and also return results in JSON we could interact with them using only strings, but that's not recommended. 为了使请求和响应更易于管理,我们将声明一些用于对 JSON 进行序列化/反序列化处理的类,以及一些可帮助 nlohmann/json 的方法。In order to make the requests and responses easier to manage, we'll declare a few classes to use for serializing / deserializing the JSON and some methods to assist nlohmann/json.

继续操作并将其声明放在 recognizeSpeech 之前Go ahead and put their declarations before recognizeSpeech

class TranscriptionDefinition {
private:
    TranscriptionDefinition(string name,
        string description,
        string locale,
        string recordingsUrl,
        std::list<string> models) {

        Name = name;
        Description = description;
        RecordingsUrl = recordingsUrl;
        Locale = locale;
        Models = models;
    }

public:
    string Name;
    string Description;
    string RecordingsUrl;
    string Locale;
    std::list<string> Models;
    std::map<string, string> properties;

    static TranscriptionDefinition Create(string name, string description, string locale, string recordingsUrl) {
        return TranscriptionDefinition(name, description, locale, recordingsUrl, std::list<string>());
    }
    static TranscriptionDefinition Create(string name, string description, string locale, string recordingsUrl,
        std::list<string> models) {
        return TranscriptionDefinition(name, description, locale, recordingsUrl, models);
    }
};

void to_json(nlohmann::json& j, const TranscriptionDefinition& t) {
    j = nlohmann::json{
            { "description", t.Description },
            { "locale", t.Locale },
            { "models", t.Models },
            { "name", t.Name },
            { "properties", t.properties },
            { "recordingsurl",t.RecordingsUrl }
    };
};

void from_json(const nlohmann::json& j, TranscriptionDefinition& t) {
    j.at("locale").get_to(t.Locale);
    j.at("models").get_to(t.Models);
    j.at("name").get_to(t.Name);
    j.at("properties").get_to(t.properties);
    j.at("recordingsurl").get_to(t.RecordingsUrl);
}

class Transcription {
public:
    string name;
    string description;
    string locale;
    string recordingsUrl;
    map<string, string> resultsUrls;
    string id;
    string createdDateTime;
    string lastActionDateTime;
    string status;
    string statusMessage;
};

void to_json(nlohmann::json& j, const Transcription& t) {
    j = nlohmann::json{
            { "description", t.description },
            { "locale", t.locale },
            { "createddatetime", t.createdDateTime },
            { "name", t.name },
            { "id", t.id },
            { "recordingsurl",t.recordingsUrl },
            { "resultUrls", t.resultsUrls},
            { "status", t.status},
            { "statusmessage", t.statusMessage}
    };
};

void from_json(const nlohmann::json& j, Transcription& t) {
    j.at("description").get_to(t.description);
    j.at("locale").get_to(t.locale);
    j.at("createdDateTime").get_to(t.createdDateTime);
    j.at("name").get_to(t.name);
    j.at("recordingsUrl").get_to(t.recordingsUrl);
    t.resultsUrls = j.at("resultsUrls").get<map<string, string>>();
    j.at("status").get_to(t.status);
    j.at("statusMessage").get_to(t.statusMessage);
}
class Result
{
public:
    string Lexical;
    string ITN;
    string MaskedITN;
    string Display;
};
void from_json(const nlohmann::json& j, Result& r) {
    j.at("Lexical").get_to(r.Lexical);
    j.at("ITN").get_to(r.ITN);
    j.at("MaskedITN").get_to(r.MaskedITN);
    j.at("Display").get_to(r.Display);
}

class NBest : public Result
{
public:
    double Confidence;  
};
void from_json(const nlohmann::json& j, NBest& nb) {
    j.at("Confidence").get_to(nb.Confidence);
    j.at("Lexical").get_to(nb.Lexical);
    j.at("ITN").get_to(nb.ITN);
    j.at("MaskedITN").get_to(nb.MaskedITN);
    j.at("Display").get_to(nb.Display);
}

class SegmentResult
{
public:
    string RecognitionStatus;
    ULONG Offset;
    ULONG Duration;
    std::list<NBest> NBest;
};
void from_json(const nlohmann::json& j, SegmentResult& sr) {
    j.at("RecognitionStatus").get_to(sr.RecognitionStatus);
    j.at("Offset").get_to(sr.Offset);
    j.at("Duration").get_to(sr.Duration);
    sr.NBest = j.at("NBest").get<list<NBest>>();
}

class AudioFileResult
{
public:
    string AudioFileName;
    std::list<SegmentResult> SegmentResults;
    std::list<Result> CombinedResults;
};
void from_json(const nlohmann::json& j, AudioFileResult& arf) {
    j.at("AudioFileName").get_to(arf.AudioFileName);
    arf.SegmentResults = j.at("SegmentResults").get<list<SegmentResult>>();
    arf.CombinedResults = j.at("CombinedResults").get<list<Result>>();
}

class RootObject {
public:
    std::list<AudioFileResult> AudioFileResults;
};
void from_json(const nlohmann::json& j, RootObject& r) {
    r.AudioFileResults = j.at("AudioFileResults").get<list<AudioFileResult>>();
}

创建和配置 Http 客户端Create and configure an Http Client

首先,我们需要具有正确的基本 URL 和身份验证集的 Http 客户端。The first thing we'll need is an Http Client that has a correct base URL and authentication set. 将此代码插入 recognizeSpeechInsert this code in recognizeSpeech

utility::string_t service_url = U("https://") + region + U(".cris.azure.cn/api/speechtotext/v2.0/Transcriptions/");
uri u(service_url);

http_client c(u);
http_request msg(methods::POST);
msg.headers().add(U("Content-Type"), U("application/json"));
msg.headers().add(U("Ocp-Apim-Subscription-Key"), subscriptionKey);

生成听录请求Generate a transcription request

接下来,我们将生成听录请求。Next, we'll generate the transcription request. 将此代码添加到 recognizeSpeechAdd this code to recognizeSpeech

auto transportdef = TranscriptionDefinition::Create(name, description, myLocale, recordingsBlobUri);

nlohmann::json transportdefJSON = transportdef;

msg.set_body(transportdefJSON.dump());

发送请求并查看其状态Send the request and check its status

现在我们将请求发布到语音服务并检查初始响应代码。Now we post the request to the Speech service and check the initial response code. 此响应代码将仅指示服务是否已收到请求。This response code will simply indicate if the service has received the request. 该服务将在响应标头中返回一个 URL,这是它将存储听录状态的位置。The service will return a Url in the response headers that's the location where it will store the transcription status.

auto response = c.request(msg).get();
auto statusCode = response.status_code();

if (statusCode != status_codes::Accepted)
{
    cout << "Unexpected status code " << statusCode << endl;
    return;
}

string_t transcriptionLocation = response.headers()[U("location")];

cout << "Transcription status is located at " << converter.to_bytes(transcriptionLocation) << endl;

等待听录完成Wait for the transcription to complete

由于服务以异步方式处理听录,因此需要时常轮询其状态。Since the service processes the transcription asynchronously, we need to poll for its status every so often. 我们每 5 秒查看一次。We'll check every 5 seconds.

通过检索在发布请求时收到的 URL 中的内容,可以查看状态。We can check the status by retrieving the content at the Url we got when the posted the request. 内容返回后,我们将其反序列化为一个帮助程序类,使其便于交互。When we get the content back, we deserialize it into one of our helper class to make it easier to interact with.

下面是一个轮询代码,其中显示了除成功完成之外的所有状态,我们会在下一步完成该操作。Here's the polling code with status display for everything except a successful completion, we'll do that next.

bool completed = false;

while (!completed)
{
    auto statusResponse = statusCheckClient.request(statusCheckMessage).get();
    auto statusResponseCode = statusResponse.status_code();

    if (statusResponseCode != status_codes::OK)
    {
        cout << "Fetching the transcription returned unexpected http code " << statusResponseCode << endl;
        return;
    }

    auto body = statusResponse.extract_string().get();
    nlohmann::json statusJSON = nlohmann::json::parse(body);
    Transcription transcriptonStatus = statusJSON;

    if (!_stricmp(transcriptonStatus.status.c_str(), "Failed"))
    {
        completed = true;
        cout << "Transcription has failed " << transcriptonStatus.statusMessage << endl;
    }
    else if (!_stricmp(transcriptonStatus.status.c_str(), "Succeeded"))
    {
    }
    else if (!_stricmp(transcriptonStatus.status.c_str(), "Running"))
    {
        cout << "Transcription is running." << endl;
    }
    else if (!_stricmp(transcriptonStatus.status.c_str(), "NotStarted"))
    {
        cout << "Transcription has not started." << endl;
    }

    if (!completed) {
        Sleep(5000);
    }

}

显示听录结果Display the transcription results

服务成功完成听录后,结果将存储在可从状态响应中获取的其他 URL 中。Once the service has successfully completed the transcription the results will be stored in another Url that we can get from the status response.

我们将下载该 URL 的所有内容,对 JSON 进行反序列化,并循环遍历不断输出显示文本的结果。We'll download the contents of that URL, deserialize the JSON, and loop through the results printing out the display text as we go.

completed = true;
cout << "Success!" << endl;
string result = transcriptonStatus.resultsUrls["channel_0"];
cout << "Transcription has completed. Results are at " << result << endl;
cout << "Fetching results" << endl;

http_client result_client(converter.from_bytes(result));
http_request resultMessage(methods::GET);
resultMessage.headers().add(U("Ocp-Apim-Subscription-Key"), subscriptionKey);

auto resultResponse = result_client.request(resultMessage).get();

auto responseCode = resultResponse.status_code();

if (responseCode != status_codes::OK)
{
    cout << "Fetching the transcription returned unexpected http code " << responseCode << endl;
    return;
}

auto resultBody = resultResponse.extract_string().get();

nlohmann::json resultJSON = nlohmann::json::parse(resultBody);
RootObject root = resultJSON;

for (AudioFileResult af : root.AudioFileResults)
{
    cout << "There were " << af.SegmentResults.size() << " results in " << af.AudioFileName << endl;
    
    for (SegmentResult segResult : af.SegmentResults)
    {
        cout << "Status: " << segResult.RecognitionStatus << endl;

        if (!_stricmp(segResult.RecognitionStatus.c_str(), "success") && segResult.NBest.size() > 0)
        {
            cout << "Best text result was: '" << segResult.NBest.front().Display << "'" << endl;
        }
    }
}

查看代码Check your code

此时,代码应如下所示:(我们已向此版本添加了一些注释)At this point, your code should look like this: (We've added some comments to this version)

#include <iostream>
#include <strstream>
#include <Windows.h>
#include <locale>
#include <codecvt>
#include <string>

#include <cpprest/http_client.h>
#include <cpprest/filestream.h>
#include <nlohmann/json.hpp>

using namespace std;
using namespace utility;                    // Common utilities like string conversions
using namespace web;                        // Common features like URIs.
using namespace web::http;                  // Common HTTP functionality
using namespace web::http::client;          // HTTP client features
using namespace concurrency::streams;       // Asynchronous streams
using json = nlohmann::json;

const string_t region = U("YourServiceRegion");
const string_t subscriptionKey = U("YourSubscriptionKey");
const string name = "Simple transcription";
const string description = "Simple transcription description";
const string myLocale = "en-US";
const string recordingsBlobUri = "YourFileUrl";

class TranscriptionDefinition {
private:
    TranscriptionDefinition(string name,
        string description,
        string locale,
        string recordingsUrl,
        std::list<string> models) {

        Name = name;
        Description = description;
        RecordingsUrl = recordingsUrl;
        Locale = locale;
        Models = models;
    }

public:
    string Name;
    string Description;
    string RecordingsUrl;
    string Locale;
    std::list<string> Models;
    std::map<string, string> properties;

    static TranscriptionDefinition Create(string name, string description, string locale, string recordingsUrl) {
        return TranscriptionDefinition(name, description, locale, recordingsUrl, std::list<string>());
    }
    static TranscriptionDefinition Create(string name, string description, string locale, string recordingsUrl,
        std::list<string> models) {
        return TranscriptionDefinition(name, description, locale, recordingsUrl, models);
    }
};

void to_json(nlohmann::json& j, const TranscriptionDefinition& t) {
    j = nlohmann::json{
            { "description", t.Description },
            { "locale", t.Locale },
            { "models", t.Models },
            { "name", t.Name },
            { "properties", t.properties },
            { "recordingsurl",t.RecordingsUrl }
    };
};

void from_json(const nlohmann::json& j, TranscriptionDefinition& t) {
    j.at("locale").get_to(t.Locale);
    j.at("models").get_to(t.Models);
    j.at("name").get_to(t.Name);
    j.at("properties").get_to(t.properties);
    j.at("recordingsurl").get_to(t.RecordingsUrl);
}

class Transcription {
public:
    string name;
    string description;
    string locale;
    string recordingsUrl;
    map<string, string> resultsUrls;
    string id;
    string createdDateTime;
    string lastActionDateTime;
    string status;
    string statusMessage;
};

void to_json(nlohmann::json& j, const Transcription& t) {
    j = nlohmann::json{
            { "description", t.description },
            { "locale", t.locale },
            { "createddatetime", t.createdDateTime },
            { "name", t.name },
            { "id", t.id },
            { "recordingsurl",t.recordingsUrl },
            { "resultUrls", t.resultsUrls},
            { "status", t.status},
            { "statusmessage", t.statusMessage}
    };
};

void from_json(const nlohmann::json& j, Transcription& t) {
    j.at("description").get_to(t.description);
    j.at("locale").get_to(t.locale);
    j.at("createdDateTime").get_to(t.createdDateTime);
    j.at("name").get_to(t.name);
    j.at("recordingsUrl").get_to(t.recordingsUrl);
    t.resultsUrls = j.at("resultsUrls").get<map<string, string>>();
    j.at("status").get_to(t.status);
    j.at("statusMessage").get_to(t.statusMessage);
}
class Result
{
public:
    string Lexical;
    string ITN;
    string MaskedITN;
    string Display;
};
void from_json(const nlohmann::json& j, Result& r) {
    j.at("Lexical").get_to(r.Lexical);
    j.at("ITN").get_to(r.ITN);
    j.at("MaskedITN").get_to(r.MaskedITN);
    j.at("Display").get_to(r.Display);
}

class NBest : public Result
{
public:
    double Confidence;  
};
void from_json(const nlohmann::json& j, NBest& nb) {
    j.at("Confidence").get_to(nb.Confidence);
    j.at("Lexical").get_to(nb.Lexical);
    j.at("ITN").get_to(nb.ITN);
    j.at("MaskedITN").get_to(nb.MaskedITN);
    j.at("Display").get_to(nb.Display);
}

class SegmentResult
{
public:
    string RecognitionStatus;
    ULONG Offset;
    ULONG Duration;
    std::list<NBest> NBest;
};
void from_json(const nlohmann::json& j, SegmentResult& sr) {
    j.at("RecognitionStatus").get_to(sr.RecognitionStatus);
    j.at("Offset").get_to(sr.Offset);
    j.at("Duration").get_to(sr.Duration);
    sr.NBest = j.at("NBest").get<list<NBest>>();
}

class AudioFileResult
{
public:
    string AudioFileName;
    std::list<SegmentResult> SegmentResults;
    std::list<Result> CombinedResults;
};
void from_json(const nlohmann::json& j, AudioFileResult& arf) {
    j.at("AudioFileName").get_to(arf.AudioFileName);
    arf.SegmentResults = j.at("SegmentResults").get<list<SegmentResult>>();
    arf.CombinedResults = j.at("CombinedResults").get<list<Result>>();
}

class RootObject {
public:
    std::list<AudioFileResult> AudioFileResults;
};
void from_json(const nlohmann::json& j, RootObject& r) {
    r.AudioFileResults = j.at("AudioFileResults").get<list<AudioFileResult>>();
}


void recognizeSpeech()
{
    std::wstring_convert<std::codecvt_utf8_utf16<wchar_t >> converter;

    utility::string_t service_url = U("https://") + region + U(".cris.azure.cn/api/speechtotext/v2.0/Transcriptions/");
    uri u(service_url);

    http_client c(u);
    http_request msg(methods::POST);
    msg.headers().add(U("Content-Type"), U("application/json"));
    msg.headers().add(U("Ocp-Apim-Subscription-Key"), subscriptionKey);

    auto transportdef = TranscriptionDefinition::Create(name, description, myLocale, recordingsBlobUri);

    nlohmann::json transportdefJSON = transportdef;

    msg.set_body(transportdefJSON.dump());

    auto response = c.request(msg).get();
    auto statusCode = response.status_code();

    if (statusCode != status_codes::Accepted)
    {
        cout << "Unexpected status code " << statusCode << endl;
        return;
    }

    string_t transcriptionLocation = response.headers()[U("location")];

    cout << "Transcription status is located at " << converter.to_bytes(transcriptionLocation) << endl;

    http_client statusCheckClient(transcriptionLocation);
    http_request statusCheckMessage(methods::GET);
    statusCheckMessage.headers().add(U("Ocp-Apim-Subscription-Key"), subscriptionKey);

    bool completed = false;

    while (!completed)
    {
        auto statusResponse = statusCheckClient.request(statusCheckMessage).get();
        auto statusResponseCode = statusResponse.status_code();

        if (statusResponseCode != status_codes::OK)
        {
            cout << "Fetching the transcription returned unexpected http code " << statusResponseCode << endl;
            return;
        }

        auto body = statusResponse.extract_string().get();
        nlohmann::json statusJSON = nlohmann::json::parse(body);
        Transcription transcriptonStatus = statusJSON;

        if (!_stricmp(transcriptonStatus.status.c_str(), "Failed"))
        {
            completed = true;
            cout << "Transcription has failed " << transcriptonStatus.statusMessage << endl;
        }
        else if (!_stricmp(transcriptonStatus.status.c_str(), "Succeeded"))
        {
            completed = true;
            cout << "Success!" << endl;
            string result = transcriptonStatus.resultsUrls["channel_0"];
            cout << "Transcription has completed. Results are at " << result << endl;
            cout << "Fetching results" << endl;

            http_client result_client(converter.from_bytes(result));
            http_request resultMessage(methods::GET);
            resultMessage.headers().add(U("Ocp-Apim-Subscription-Key"), subscriptionKey);

            auto resultResponse = result_client.request(resultMessage).get();

            auto responseCode = resultResponse.status_code();

            if (responseCode != status_codes::OK)
            {
                cout << "Fetching the transcription returned unexpected http code " << responseCode << endl;
                return;
            }

            auto resultBody = resultResponse.extract_string().get();
            
            nlohmann::json resultJSON = nlohmann::json::parse(resultBody);
            RootObject root = resultJSON;

            for (AudioFileResult af : root.AudioFileResults)
            {
                cout << "There were " << af.SegmentResults.size() << " results in " << af.AudioFileName << endl;
                
                for (SegmentResult segResult : af.SegmentResults)
                {
                    cout << "Status: " << segResult.RecognitionStatus << endl;

                    if (!_stricmp(segResult.RecognitionStatus.c_str(), "success") && segResult.NBest.size() > 0)
                    {
                        cout << "Best text result was: '" << segResult.NBest.front().Display << "'" << endl;
                    }
                }
            }
        }
        else if (!_stricmp(transcriptonStatus.status.c_str(), "Running"))
        {
            cout << "Transcription is running." << endl;
        }
        else if (!_stricmp(transcriptonStatus.status.c_str(), "NotStarted"))
        {
            cout << "Transcription has not started." << endl;
        }

        if (!completed) {
            Sleep(5000);
        }

    }
}

int wmain()
{
    recognizeSpeech();
    cout << "Please press a key to continue.\n";
    cin.get();
    return 0;
}

生成并运行应用Build and run your app

现在,可以使用语音服务构建应用并测试语音识别。Now you're ready to build your app and test our speech recognition using the Speech service.

后续步骤Next steps


在本快速入门中,你将使用 REST API 在批处理中识别文件中的语音。In this quickstart, you will use a REST API to recognize speech from files in a batch process. 批处理执行语音听录,无需任何用户交互。A batch process executes the speech transcription without any user interactions. 它提供了一个简单的编程模型,无需管理并发、自定义语音识别模型或其他详细信息。It gives you a simple programming model, without the need to manage concurrency, custom speech models, or other details. 它需要高级控制选项,同时可以有效利用 Azure 语音服务资源。It entails advanced control options, while making efficient use of Azure speech service resources.

若要深入了解可用选项和配置详细信息,请参阅批量听录For more information on the available options and configuration details, see batch transcription.

以下快速入门将指导你完成使用示例。The following quickstart will walk you through a usage sample.

如果希望直入正题,请在 GitHub 上查看或下载所有语音 SDK Java 示例If you prefer to jump right in, view or download all Speech SDK Java Samples on GitHub. 否则就开始吧!Otherwise, let's get started.

先决条件Prerequisites

在开始之前,请务必:Before you get started, make sure to:

在 Eclipse 中打开项目Open your project in Eclipse

第一步是确保项目在 Eclipse 中打开。The first step is to make sure that you have your project open in Eclipse.

  1. 启动 EclipseLaunch Eclipse
  2. 加载项目并打开 Main.javaLoad your project and open Main.java.

添加对 Gson 的引用Add a reference to Gson

本快速入门中将使用外部 JSON 序列化程序/反序列化程序。We'll be using an external JSON serializer / deserializer in this quickstart. 对于 Java,我们选择了 GsonFor Java we've chosen Gson.

打开 pom.xml 并添加以下引用。Open your pom.xml and add the following reference.

<dependencies>
<dependency>
    <groupId>com.google.code.gson</groupId>
    <artifactId>gson</artifactId>
    <version>2.8.6</version>
</dependency>
</dependencies>

从一些样本代码入手Start with some boilerplate code

添加一些代码作为项目的框架。Let's add some code that works as a skeleton for our project.

package quickstart;

import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Date;
import java.util.Dictionary;
import java.util.Hashtable;
import java.util.UUID;

import com.google.gson.Gson;
public class Main {

    private static String region = "YourServiceRegion";
    private static String subscriptionKey = "YourSubscriptionKey";
    private static String Locale = "en-US";
    private static String RecordingsBlobUri = "YourFileUrl";
    private static String Name = "Simple transcription";
    private static String Description = "Simple transcription description";

    public static void main(String[] args) throws IOException, InterruptedException {
        System.out.println("Starting transcriptions client...");
    }
}

需要替换以下值:You'll need to replace the following values:

  • YourSubscriptionKey:在语音资源的 Azure 门户的“密钥” 页中找到YourSubscriptionKey: found in the the Keys page of the Azure portal for the Speech resource
  • YourServiceRegion:在语音资源的 Azure 门户的“概述” 页中找到YourServiceRegion: found in the the Overview page of the Azure portal for the Speech resource
  • YourFileUrl:在存储帐户资源的 Azure 门户的“Blob 服务/容器” 页下找到YourFileUrl: found in under the Blob service / Containers page of the Azure portal for the Storage account resource
    • 选择适当的容器Select the appropriate container
    • 选择所需的 blobSelect the desired blob
    • 在“属性” 页下复制“URL” Copy the URL under the Properties page

JSON 包装器JSON Wrappers

由于 REST API 采用 JSON 格式的请求并且也返回 JSON 结果,因此我们可以仅使用字符串与之进行交互,但不建议这样做。As the REST API's take requests in JSON format and also return results in JSON we could interact with them using only strings, but that's not recommended. 为了使请求和响应更易于管理,我们将声明一些用于对 JSON 进行序列化/反序列化处理的类。In order to make the requests and responses easier to manage, we'll declare a few classes to use for serializing / deserializing the JSON.

开始操作并在 Main 之前放置声明。Go ahead and put their declarations before Main.

final class Transcription {
    public String name;
    public String description;
    public String locale;
    public URL recordingsUrl;
    public Hashtable<String, String> resultsUrls;
    public UUID id;
    public Date createdDateTime;
    public Date lastActionDateTime;
    public String status;
    public String statusMessage;
}

final class TranscriptionDefinition {
    private TranscriptionDefinition(String name, String description, String locale, URL recordingsUrl,
            ModelIdentity[] models) {
        this.Name = name;
        this.Description = description;
        this.RecordingsUrl = recordingsUrl;
        this.Locale = locale;
        this.Models = models;
        this.properties = new Hashtable<String, String>();
        this.properties.put("PunctuationMode", "DictatedAndAutomatic");
        this.properties.put("ProfanityFilterMode", "Masked");
        this.properties.put("AddWordLevelTimestamps", "True");
    }

    public String Name;
    public String Description;
    public URL RecordingsUrl;
    public String Locale;
    public ModelIdentity[] Models;
    public Dictionary<String, String> properties;

    public static TranscriptionDefinition Create(String name, String description, String locale, URL recordingsUrl) {
        return TranscriptionDefinition.Create(name, description, locale, recordingsUrl, new ModelIdentity[0]);
    }

    public static TranscriptionDefinition Create(String name, String description, String locale, URL recordingsUrl,
            ModelIdentity[] models) {
        return new TranscriptionDefinition(name, description, locale, recordingsUrl, models);
    }
}

final class ModelIdentity {
    private ModelIdentity(UUID id) {
        this.Id = id;
    }

    public UUID Id;

    public static ModelIdentity Create(UUID Id) {
        return new ModelIdentity(Id);
    }
}

class AudioFileResult {
    public String AudioFileName;
    public SegmentResult[] SegmentResults;
}

class RootObject {
    public AudioFileResult[] AudioFileResults;
}

class NBest {
    public double Confidence;
    public String Lexical;
    public String ITN;
    public String MaskedITN;
    public String Display;
}

class SegmentResult {
    public String RecognitionStatus;
    public String Offset;
    public String Duration;
    public NBest[] NBest;
}

创建和配置 Http 客户端Create and configure an Http Client

首先,我们需要具有正确的基本 URL 和身份验证集的 Http 客户端。The first thing we'll need is an Http Client that has a correct base URL and authentication set. 将此代码插入 MainInsert this code in Main

String url = "https://" + region + ".cris.azure.cn/api/speechtotext/v2.0/Transcriptions/";
URL serviceUrl = new URL(url);

HttpURLConnection postConnection = (HttpURLConnection) serviceUrl.openConnection();
postConnection.setDoOutput(true);
postConnection.setRequestMethod("POST");
postConnection.setRequestProperty("Content-Type", "application/json");
postConnection.setRequestProperty("Ocp-Apim-Subscription-Key", subscriptionKey);

生成听录请求Generate a transcription request

接下来,我们将生成听录请求。Next, we'll generate the transcription request. 将此代码添加到 MainAdd this code to Main

TranscriptionDefinition definition = TranscriptionDefinition.Create(Name, Description, Locale,
        new URL(RecordingsBlobUri));

发送请求并查看其状态Send the request and check its status

现在我们将请求发布到语音服务并检查初始响应代码。Now we post the request to the Speech service and check the initial response code. 此响应代码将仅指示服务是否已收到请求。This response code will simply indicate if the service has received the request. 该服务将在响应标头中返回一个 URL,这是它将存储听录状态的位置。The service will return a Url in the response headers that's the location where it will store the transcription status.

Gson gson = new Gson();

OutputStream stream = postConnection.getOutputStream();
stream.write(gson.toJson(definition).getBytes());
stream.flush();

int statusCode = postConnection.getResponseCode();

if (statusCode != HttpURLConnection.HTTP_ACCEPTED) {
    System.out.println("Unexpected status code " + statusCode);
    return;
}

等待听录完成Wait for the transcription to complete

由于服务以异步方式处理听录,因此需要时常轮询其状态。Since the service processes the transcription asynchronously, we need to poll for its status every so often. 我们每 5 秒查看一次。We'll check every 5 seconds.

通过检索在发布请求时收到的 URL 中的内容,可以查看状态。We can check the status by retrieving the content at the Url we got when the posted the request. 内容返回后,我们将其反序列化为一个帮助程序类,使其便于交互。When we get the content back, we deserialize it into one of our helper class to make it easier to interact with.

下面是一个轮询代码,其中显示了除成功完成之外的所有状态,我们会在下一步完成该操作。Here's the polling code with status display for everything except a successful completion, we'll do that next.

String transcriptionLocation = postConnection.getHeaderField("location");

System.out.println("Transcription is located at " + transcriptionLocation);

URL transcriptionUrl = new URL(transcriptionLocation);

boolean completed = false;
while (!completed) {
    HttpURLConnection getConnection = (HttpURLConnection) transcriptionUrl.openConnection();
    getConnection.setRequestProperty("Ocp-Apim-Subscription-Key", subscriptionKey);
    getConnection.setRequestMethod("GET");

    int responseCode = getConnection.getResponseCode();

    if (responseCode != HttpURLConnection.HTTP_OK) {
        System.out.println("Fetching the transcription returned unexpected http code " + responseCode);
        return;
    }

    Transcription t = gson.fromJson(new InputStreamReader(getConnection.getInputStream()),
            Transcription.class);

    switch (t.status) {
    case "Failed":
        completed = true;
        System.out.println("Transcription has failed " + t.statusMessage);
        break;
    case "Succeeded":
        break;
    case "Running":
        System.out.println("Transcription is running.");
        break;
    case "NotStarted":
        System.out.println("Transcription has not started.");
        break;
    }

    if (!completed) {
        Thread.sleep(5000);
    }
}

显示听录结果Display the transcription results

服务成功完成听录后,结果将存储在可从状态响应中获取的其他 URL 中。Once the service has successfully completed the transcription the results will be stored in another Url that we can get from the status response.

我们将下载该 URL 的所有内容,对 JSON 进行反序列化,并循环遍历不断输出显示文本的结果。We'll download the contents of that URL, deserialize the JSON, and loop through the results printing out the display text as we go.

package quickstart;

import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Date;
import java.util.Dictionary;
import java.util.Hashtable;
import java.util.UUID;

import com.google.gson.Gson;

final class Transcription {
    public String name;
    public String description;
    public String locale;
    public URL recordingsUrl;
    public Hashtable<String, String> resultsUrls;
    public UUID id;
    public Date createdDateTime;
    public Date lastActionDateTime;
    public String status;
    public String statusMessage;
}

final class TranscriptionDefinition {
    private TranscriptionDefinition(String name, String description, String locale, URL recordingsUrl,
            ModelIdentity[] models) {
        this.Name = name;
        this.Description = description;
        this.RecordingsUrl = recordingsUrl;
        this.Locale = locale;
        this.Models = models;
        this.properties = new Hashtable<String, String>();
        this.properties.put("PunctuationMode", "DictatedAndAutomatic");
        this.properties.put("ProfanityFilterMode", "Masked");
        this.properties.put("AddWordLevelTimestamps", "True");
    }

    public String Name;
    public String Description;
    public URL RecordingsUrl;
    public String Locale;
    public ModelIdentity[] Models;
    public Dictionary<String, String> properties;

    public static TranscriptionDefinition Create(String name, String description, String locale, URL recordingsUrl) {
        return TranscriptionDefinition.Create(name, description, locale, recordingsUrl, new ModelIdentity[0]);
    }

    public static TranscriptionDefinition Create(String name, String description, String locale, URL recordingsUrl,
            ModelIdentity[] models) {
        return new TranscriptionDefinition(name, description, locale, recordingsUrl, models);
    }
}

final class ModelIdentity {
    private ModelIdentity(UUID id) {
        this.Id = id;
    }

    public UUID Id;

    public static ModelIdentity Create(UUID Id) {
        return new ModelIdentity(Id);
    }
}

class AudioFileResult {
    public String AudioFileName;
    public SegmentResult[] SegmentResults;
}

class RootObject {
    public AudioFileResult[] AudioFileResults;
}

class NBest {
    public double Confidence;
    public String Lexical;
    public String ITN;
    public String MaskedITN;
    public String Display;
}

class SegmentResult {
    public String RecognitionStatus;
    public String Offset;
    public String Duration;
    public NBest[] NBest;
}

public class Main {

    private static String region = "YourServiceRegion";
    private static String subscriptionKey = "YourSubscriptionKey";
    private static String Locale = "en-US";
    private static String RecordingsBlobUri = "YourFileUrl";
    private static String Name = "Simple transcription";
    private static String Description = "Simple transcription description";

    public static void main(String[] args) throws IOException, InterruptedException {
        System.out.println("Starting transcriptions client...");
        String url = "https://" + region + ".cris.azure.cn/api/speechtotext/v2.0/Transcriptions/";
        URL serviceUrl = new URL(url);

        HttpURLConnection postConnection = (HttpURLConnection) serviceUrl.openConnection();
        postConnection.setDoOutput(true);
        postConnection.setRequestMethod("POST");
        postConnection.setRequestProperty("Content-Type", "application/json");
        postConnection.setRequestProperty("Ocp-Apim-Subscription-Key", subscriptionKey);

        TranscriptionDefinition definition = TranscriptionDefinition.Create(Name, Description, Locale,
                new URL(RecordingsBlobUri));

        Gson gson = new Gson();

        OutputStream stream = postConnection.getOutputStream();
        stream.write(gson.toJson(definition).getBytes());
        stream.flush();

        int statusCode = postConnection.getResponseCode();

        if (statusCode != HttpURLConnection.HTTP_ACCEPTED) {
            System.out.println("Unexpected status code " + statusCode);
            return;
        }

        String transcriptionLocation = postConnection.getHeaderField("location");

        System.out.println("Transcription is located at " + transcriptionLocation);

        URL transcriptionUrl = new URL(transcriptionLocation);

        boolean completed = false;
        while (!completed) {
            {
                HttpURLConnection getConnection = (HttpURLConnection) transcriptionUrl.openConnection();
                getConnection.setRequestProperty("Ocp-Apim-Subscription-Key", subscriptionKey);
                getConnection.setRequestMethod("GET");

                int responseCode = getConnection.getResponseCode();

                if (responseCode != HttpURLConnection.HTTP_OK) {
                    System.out.println("Fetching the transcription returned unexpected http code " + responseCode);
                    return;
                }

                Transcription t = gson.fromJson(new InputStreamReader(getConnection.getInputStream()),
                        Transcription.class);

                switch (t.status) {
                case "Failed":
                    completed = true;
                    System.out.println("Transcription has failed " + t.statusMessage);
                    break;
                case "Succeeded":
                    completed = true;
                    String result = t.resultsUrls.get("channel_0");
                    System.out.println("Transcription has completed. Results are at " + result);

                    System.out.println("Fetching results");
                    URL resultUrl = new URL(result);

                    HttpURLConnection resultConnection = (HttpURLConnection) resultUrl.openConnection();
                    resultConnection.setRequestProperty("Ocp-Apim-Subscription-Key", subscriptionKey);
                    resultConnection.setRequestMethod("GET");

                    responseCode = resultConnection.getResponseCode();

                    if (responseCode != HttpURLConnection.HTTP_OK) {
                        System.out.println("Fetching the transcription returned unexpected http code " + responseCode);
                        return;
                    }

                    RootObject root = gson.fromJson(new InputStreamReader(resultConnection.getInputStream()),
                            RootObject.class);

                    for (AudioFileResult af : root.AudioFileResults) {
                        System.out
                                .println("There were " + af.SegmentResults.length + " results in " + af.AudioFileName);
                        for (SegmentResult segResult : af.SegmentResults) {
                            System.out.println("Status: " + segResult.RecognitionStatus);
                            if (segResult.RecognitionStatus.equalsIgnoreCase("success") && segResult.NBest.length > 0) {
                                System.out.println("Best text result was: '" + segResult.NBest[0].Display + "'");
                            }
                        }
                    }

                    break;
                case "Running":
                    System.out.println("Transcription is running.");
                    break;
                case "NotStarted":
                    System.out.println("Transcription has not started.");
                    break;
                }

                if (!completed) {
                    Thread.sleep(5000);
                }
            }
        }
    }
}

查看代码Check your code

此时,代码应如下所示:(我们已向此版本添加了一些注释)At this point, your code should look like this: (We've added some comments to this version)

package quickstart;

import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Date;
import java.util.Dictionary;
import java.util.Hashtable;
import java.util.UUID;

import com.google.gson.Gson;

final class Transcription {
    public String name;
    public String description;
    public String locale;
    public URL recordingsUrl;
    public Hashtable<String, String> resultsUrls;
    public UUID id;
    public Date createdDateTime;
    public Date lastActionDateTime;
    public String status;
    public String statusMessage;
}

final class TranscriptionDefinition {
    private TranscriptionDefinition(String name, String description, String locale, URL recordingsUrl,
            ModelIdentity[] models) {
        this.Name = name;
        this.Description = description;
        this.RecordingsUrl = recordingsUrl;
        this.Locale = locale;
        this.Models = models;
        this.properties = new Hashtable<String, String>();
        this.properties.put("PunctuationMode", "DictatedAndAutomatic");
        this.properties.put("ProfanityFilterMode", "Masked");
        this.properties.put("AddWordLevelTimestamps", "True");
    }

    public String Name;
    public String Description;
    public URL RecordingsUrl;
    public String Locale;
    public ModelIdentity[] Models;
    public Dictionary<String, String> properties;

    public static TranscriptionDefinition Create(String name, String description, String locale, URL recordingsUrl) {
        return TranscriptionDefinition.Create(name, description, locale, recordingsUrl, new ModelIdentity[0]);
    }

    public static TranscriptionDefinition Create(String name, String description, String locale, URL recordingsUrl,
            ModelIdentity[] models) {
        return new TranscriptionDefinition(name, description, locale, recordingsUrl, models);
    }
}

final class ModelIdentity {
    private ModelIdentity(UUID id) {
        this.Id = id;
    }

    public UUID Id;

    public static ModelIdentity Create(UUID Id) {
        return new ModelIdentity(Id);
    }
}

class AudioFileResult {
    public String AudioFileName;
    public SegmentResult[] SegmentResults;
}

class RootObject {
    public AudioFileResult[] AudioFileResults;
}

class NBest {
    public double Confidence;
    public String Lexical;
    public String ITN;
    public String MaskedITN;
    public String Display;
}

class SegmentResult {
    public String RecognitionStatus;
    public String Offset;
    public String Duration;
    public NBest[] NBest;
}

public class Main {

    private static String region = "YourServiceRegion";
    private static String subscriptionKey = "YourSubscriptionKey";
    private static String Locale = "en-US";
    private static String RecordingsBlobUri = "YourFileUrl";
    private static String Name = "Simple transcription";
    private static String Description = "Simple transcription description";

    public static void main(String[] args) throws IOException, InterruptedException {
        System.out.println("Starting transcriptions client...");
        String url = "https://" + region + ".cris.azure.cn/api/speechtotext/v2.0/Transcriptions/";
        URL serviceUrl = new URL(url);

        HttpURLConnection postConnection = (HttpURLConnection) serviceUrl.openConnection();
        postConnection.setDoOutput(true);
        postConnection.setRequestMethod("POST");
        postConnection.setRequestProperty("Content-Type", "application/json");
        postConnection.setRequestProperty("Ocp-Apim-Subscription-Key", subscriptionKey);

        TranscriptionDefinition definition = TranscriptionDefinition.Create(Name, Description, Locale,
                new URL(RecordingsBlobUri));

        Gson gson = new Gson();

        OutputStream stream = postConnection.getOutputStream();
        stream.write(gson.toJson(definition).getBytes());
        stream.flush();

        int statusCode = postConnection.getResponseCode();

        if (statusCode != HttpURLConnection.HTTP_ACCEPTED) {
            System.out.println("Unexpected status code " + statusCode);
            return;
        }

        String transcriptionLocation = postConnection.getHeaderField("location");

        System.out.println("Transcription is located at " + transcriptionLocation);

        URL transcriptionUrl = new URL(transcriptionLocation);

        boolean completed = false;
        while (!completed) {
            {
                HttpURLConnection getConnection = (HttpURLConnection) transcriptionUrl.openConnection();
                getConnection.setRequestProperty("Ocp-Apim-Subscription-Key", subscriptionKey);
                getConnection.setRequestMethod("GET");

                int responseCode = getConnection.getResponseCode();

                if (responseCode != HttpURLConnection.HTTP_OK) {
                    System.out.println("Fetching the transcription returned unexpected http code " + responseCode);
                    return;
                }

                Transcription t = gson.fromJson(new InputStreamReader(getConnection.getInputStream()),
                        Transcription.class);

                switch (t.status) {
                case "Failed":
                    completed = true;
                    System.out.println("Transcription has failed " + t.statusMessage);
                    break;
                case "Succeeded":
                    completed = true;
                    String result = t.resultsUrls.get("channel_0");
                    System.out.println("Transcription has completed. Results are at " + result);

                    System.out.println("Fetching results");
                    URL resultUrl = new URL(result);

                    HttpURLConnection resultConnection = (HttpURLConnection) resultUrl.openConnection();
                    resultConnection.setRequestProperty("Ocp-Apim-Subscription-Key", subscriptionKey);
                    resultConnection.setRequestMethod("GET");

                    responseCode = resultConnection.getResponseCode();

                    if (responseCode != HttpURLConnection.HTTP_OK) {
                        System.out.println("Fetching the transcription returned unexpected http code " + responseCode);
                        return;
                    }

                    RootObject root = gson.fromJson(new InputStreamReader(resultConnection.getInputStream()),
                            RootObject.class);

                    for (AudioFileResult af : root.AudioFileResults) {
                        System.out
                                .println("There were " + af.SegmentResults.length + " results in " + af.AudioFileName);
                        for (SegmentResult segResult : af.SegmentResults) {
                            System.out.println("Status: " + segResult.RecognitionStatus);
                            if (segResult.RecognitionStatus.equalsIgnoreCase("success") && segResult.NBest.length > 0) {
                                System.out.println("Best text result was: '" + segResult.NBest[0].Display + "'");
                            }
                        }
                    }

                    break;
                case "Running":
                    System.out.println("Transcription is running.");
                    break;
                case "NotStarted":
                    System.out.println("Transcription has not started.");
                    break;
                }

                if (!completed) {
                    Thread.sleep(5000);
                }
            }
        }
    }
}

生成并运行应用Build and run your app

现在,可以使用语音服务构建应用并测试语音识别。Now you're ready to build your app and test our speech recognition using the Speech service.

后续步骤Next steps

在本快速入门中,你将使用 REST API 在批处理中识别文件中的语音。In this quickstart, you will use a REST API to recognize speech from files in a batch process. 批处理执行语音听录,无需任何用户交互。A batch process executes the speech transcription without any user interactions. 它提供了一个简单的编程模型,无需管理并发、自定义语音识别模型或其他详细信息。It gives you a simple programming model, without the need to manage concurrency, custom speech models, or other details. 它需要高级控制选项,同时可以有效利用 Azure 语音服务资源。It entails advanced control options, while making efficient use of Azure speech service resources.

若要深入了解可用选项和配置详细信息,请参阅批量听录For more information on the available options and configuration details, see batch transcription.

以下快速入门将指导你完成使用示例。The following quickstart will walk you through a usage sample.

如果希望直入正题,请在 GitHub 上查看或下载所有语音 SDK Python 示例If you prefer to jump right in, view or download all Speech SDK Python Samples on GitHub. 否则就开始吧!Otherwise, let's get started.

先决条件Prerequisites

在开始之前,请务必:Before you get started, make sure to:

下载并安装 API 客户端库Download and install the API client library

若要执行该示例,需要为通过 Swagger 生成的 REST API 生成 Python 库。To execute the sample you need to generate the Python library for the REST API which is generated through Swagger.

请按照以下步骤进行安装:Follow these steps for the installation:

  1. 转到 https://editor.swagger.ioGo to https://editor.swagger.io.
  2. 单击“文件”,然后单击“导入 URL” 。Click File, then click Import URL.
  3. 输入 Swagger URL,包括语音服务订阅的区域:https://<your-region>.cris.azure.cn/docs/v2.0/swaggerEnter the Swagger URL including the region for your Speech service subscription: https://<your-region>.cris.azure.cn/docs/v2.0/swagger.
  4. 单击“生成客户端”,然后选择“Python” 。Click Generate Client and select Python.
  5. 保存客户端库。Save the client library.
  6. 将下载的 python-client-generated.zip 提取到文件系统中的某个位置。Extract the downloaded python-client-generated.zip somewhere in your file system.
  7. 使用 pip:pip install path/to/package/python-client 在 Python 环境中安装提取的 python-client 模块。Install the extracted python-client module in your Python environment using pip: pip install path/to/package/python-client.
  8. 已安装的名为 swagger_client 的包。The installed package has the name swagger_client. 可以使用命令 python -c "import swagger_client" 来检查安装是否正常工作。You can check that the installation worked using the command python -c "import swagger_client".

备注

由于 Swagger 自动生成中的已知 bug,导入 swagger_client 包时可能会遇到错误。Due to a known bug in the Swagger autogeneration, you might encounter errors on importing the swagger_client package. 这些错误可以通过从已安装的包中删除These can be fixed by deleting the line with the content

from swagger_client.models.model import Model  # noqa: F401,E501

文件 swagger_client/models/model.py 中包含以下内容的行和from the file swagger_client/models/model.py and the line with the content

from swagger_client.models.inner_error import InnerError  # noqa: F401,E501

文件 swagger_client/models/inner_error.py 中包含以下内容的行来进行修复。from the file swagger_client/models/inner_error.py inside the installed package. 错误消息将告知安装这些文件的位置。The error message will tell you where these files are located for your installation.

安装其他依赖项Install other dependencies

该示例使用 requests 库。The sample uses the requests library. 可以通过以下命令进行安装You can install it with the command

pip install requests

从一些样本代码入手Start with some boilerplate code

添加一些代码作为项目的框架。Let's add some code that works as a skeleton for our project.

#!/usr/bin/env python
# coding: utf-8
from typing import List

import logging
import sys
import requests
import time
import swagger_client as cris_client


logging.basicConfig(stream=sys.stdout, level=logging.DEBUG, format="%(message)s")

# Your subscription key and region for the Speech service
SUBSCRIPTION_KEY = "YourSubscriptionKey"
SERVICE_REGION = "YourServiceRegion"

NAME = "Simple transcription"
DESCRIPTION = "Simple transcription description"

LOCALE = "en-US"
RECORDINGS_BLOB_URI = "<Your SAS Uri to the recording>"

# Set subscription information when doing transcription with custom models
ADAPTED_ACOUSTIC_ID = None  # guid of a custom acoustic model
ADAPTED_LANGUAGE_ID = None  # guid of a custom language model


def transcribe():
    logging.info("Starting transcription client...")


if __name__ == "__main__":
    transcribe()

需要替换以下值:You'll need to replace the following values:

  • YourSubscriptionKey:在语音资源的 Azure 门户的“密钥” 页中找到YourSubscriptionKey: found in the the Keys page of the Azure portal for the Speech resource
  • YourServiceRegion:在语音资源的 Azure 门户的“概述” 页中找到YourServiceRegion: found in the the Overview page of the Azure portal for the Speech resource
  • YourFileUrl:在存储帐户资源的 Azure 门户的“Blob 服务/容器” 页下找到YourFileUrl: found in under the Blob service / Containers page of the Azure portal for the Storage account resource
    • 选择适当的容器Select the appropriate container
    • 选择所需的 blobSelect the desired blob
    • 在“属性” 页下复制“URL” Copy the URL under the Properties page

创建和配置 Http 客户端Create and configure an Http Client

首先,我们需要具有正确的基本 URL 和身份验证集的 Http 客户端。The first thing we'll need is an Http Client that has a correct base URL and authentication set. 将此代码插入 transcribeInsert this code in transcribe

configuration = cris_client.Configuration()
configuration.api_key['Ocp-Apim-Subscription-Key'] = SUBSCRIPTION_KEY
configuration.host = "https://{}.cris.azure.cn".format(SERVICE_REGION)

# create the client object and authenticate
client = cris_client.ApiClient(configuration)

# create an instance of the transcription api class
transcription_api = cris_client.CustomSpeechTranscriptionsApi(api_client=client)

生成听录请求Generate a transcription request

接下来,我们将生成听录请求。Next, we'll generate the transcription request. 将此代码添加到 transcribeAdd this code to transcribe

transcription_definition = cris_client.TranscriptionDefinition(
    name=NAME, description=DESCRIPTION, locale=LOCALE, recordings_url=RECORDINGS_BLOB_URI
)

发送请求并查看其状态Send the request and check its status

现在我们将请求发布到语音服务并检查初始响应代码。Now we post the request to the Speech service and check the initial response code. 此响应代码将仅指示服务是否已收到请求。This response code will simply indicate if the service has received the request. 该服务将在响应标头中返回一个 URL,这是它将存储听录状态的位置。The service will return a Url in the response headers that's the location where it will store the transcription status.

data, status, headers = transcription_api.create_transcription_with_http_info(transcription_definition)

# extract transcription location from the headers
transcription_location: str = headers["location"]

# get the transcription Id from the location URI
created_transcription: str = transcription_location.split('/')[-1]

logging.info("Created new transcription with id {}".format(created_transcription))

等待听录完成Wait for the transcription to complete

由于服务以异步方式处理听录,因此需要时常轮询其状态。Since the service processes the transcription asynchronously, we need to poll for its status every so often. 我们每 5 秒查看一次。We'll check every 5 seconds.

我们将枚举此语音服务资源正在处理的所有听录,并查找我们创建的听录。We'll enumerate all the transcriptions that this Speech service resource is processing and look for the one we created.

下面是一个轮询代码,其中显示了除成功完成之外的所有状态,我们会在下一步完成该操作。Here's the polling code with status display for everything except a successful completion, we'll do that next.

logging.info("Checking status.")

completed = False

while not completed:
    running, not_started = 0, 0

    # get all transcriptions for the user
    transcriptions: List[cris_client.Transcription] = transcription_api.get_transcriptions()

    # for each transcription in the list we check the status
    for transcription in transcriptions:
        if transcription.status in ("Failed", "Succeeded"):
            # we check to see if it was the transcription we created from this client
            if created_transcription != transcription.id:
                continue

            completed = True

            if transcription.status == "Succeeded":
            else:
                logging.info("Transcription failed :{}.".format(transcription.status_message))
                break
        elif transcription.status == "Running":
            running += 1
        elif transcription.status == "NotStarted":
            not_started += 1

    logging.info("Transcriptions status: "
            "completed (this transcription): {}, {} running, {} not started yet".format(
                completed, running, not_started))

    # wait for 5 seconds
    time.sleep(5)

显示听录结果Display the transcription results

服务成功完成听录后,结果将存储在可从状态响应中获取的其他 URL 中。Once the service has successfully completed the transcription the results will be stored in another Url that we can get from the status response.

此处我们获取并显示了 JSON 结果。Here we get that result JSON and display it.

results_uri = transcription.results_urls["channel_0"]
results = requests.get(results_uri)
logging.info("Transcription succeeded. Results: ")
logging.info(results.content.decode("utf-8"))

查看代码Check your code

此时,代码应如下所示:(我们已向此版本添加了一些注释)At this point, your code should look like this: (We've added some comments to this version)

#!/usr/bin/env python
# coding: utf-8

# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license. See LICENSE.md file in the project root for full license information.

from typing import List

import logging
import sys
import requests
import time
import swagger_client as cris_client


logging.basicConfig(stream=sys.stdout, level=logging.DEBUG, format="%(message)s")

# Your subscription key and region for the Speech service
SUBSCRIPTION_KEY = "YourSubscriptionKey"
SERVICE_REGION = "YourServiceRegion"

NAME = "Simple transcription"
DESCRIPTION = "Simple transcription description"

LOCALE = "en-US"
RECORDINGS_BLOB_URI = "<Your SAS Uri to the recording>"

# Set subscription information when doing transcription with custom models
ADAPTED_ACOUSTIC_ID = None  # guid of a custom acoustic model
ADAPTED_LANGUAGE_ID = None  # guid of a custom language model


def transcribe():
    logging.info("Starting transcription client...")

    # configure API key authorization: subscription_key
    configuration = cris_client.Configuration()
    configuration.api_key['Ocp-Apim-Subscription-Key'] = SUBSCRIPTION_KEY
    configuration.host = "https://{}.cris.azure.cn".format(SERVICE_REGION)

    # create the client object and authenticate
    client = cris_client.ApiClient(configuration)

    # create an instance of the transcription api class
    transcription_api = cris_client.CustomSpeechTranscriptionsApi(api_client=client)

    # Use base models for transcription. Comment this block if you are using a custom model.
    # Note: you can specify additional transcription properties by passing a
    # dictionary in the properties parameter. See
    # https://docs.azure.cn/cognitive-services/speech-service/batch-transcription
    # for supported parameters.
    transcription_definition = cris_client.TranscriptionDefinition(
        name=NAME, description=DESCRIPTION, locale=LOCALE, recordings_url=RECORDINGS_BLOB_URI
    )

    # Uncomment this block to use custom models for transcription.
    # Model information (ADAPTED_ACOUSTIC_ID and ADAPTED_LANGUAGE_ID) must be set above.
    # if ADAPTED_ACOUSTIC_ID is None or ADAPTED_LANGUAGE_ID is None:
    #     logging.info("Custom model ids must be set to when using custom models")
    # transcription_definition = cris_client.TranscriptionDefinition(
    #     name=NAME, description=DESCRIPTION, locale=LOCALE, recordings_url=RECORDINGS_BLOB_URI,
    #     models=[cris_client.ModelIdentity(ADAPTED_ACOUSTIC_ID), cris_client.ModelIdentity(ADAPTED_LANGUAGE_ID)]
    # )

    data, status, headers = transcription_api.create_transcription_with_http_info(transcription_definition)

    # extract transcription location from the headers
    transcription_location: str = headers["location"]

    # get the transcription Id from the location URI
    created_transcription: str = transcription_location.split('/')[-1]

    logging.info("Created new transcription with id {}".format(created_transcription))

    logging.info("Checking status.")

    completed = False

    while not completed:
        running, not_started = 0, 0

        # get all transcriptions for the user
        transcriptions: List[cris_client.Transcription] = transcription_api.get_transcriptions()

        # for each transcription in the list we check the status
        for transcription in transcriptions:
            if transcription.status in ("Failed", "Succeeded"):
                # we check to see if it was the transcription we created from this client
                if created_transcription != transcription.id:
                    continue

                completed = True

                if transcription.status == "Succeeded":
                    results_uri = transcription.results_urls["channel_0"]
                    results = requests.get(results_uri)
                    logging.info("Transcription succeeded. Results: ")
                    logging.info(results.content.decode("utf-8"))
                else:
                    logging.info("Transcription failed :{}.".format(transcription.status_message))
                    break
            elif transcription.status == "Running":
                running += 1
            elif transcription.status == "NotStarted":
                not_started += 1

        logging.info("Transcriptions status: "
                "completed (this transcription): {}, {} running, {} not started yet".format(
                    completed, running, not_started))

        # wait for 5 seconds
        time.sleep(5)

    input("Press any key...")


if __name__ == "__main__":
    transcribe()

生成并运行应用Build and run your app

现在,可以使用语音服务构建应用并测试语音识别。Now you're ready to build your app and test our speech recognition using the Speech service.

后续步骤Next steps

在本快速入门中,你将使用 REST API 在批处理中识别文件中的语音。In this quickstart, you will use a REST API to recognize speech from files in a batch process. 批处理执行语音听录,无需任何用户交互。A batch process executes the speech transcription without any user interactions. 它提供了一个简单的编程模型,无需管理并发、自定义语音识别模型或其他详细信息。It gives you a simple programming model, without the need to manage concurrency, custom speech models, or other details. 它需要高级控制选项,同时可以有效利用 Azure 语音服务资源。It entails advanced control options, while making efficient use of Azure speech service resources.

若要深入了解可用选项和配置详细信息,请参阅批量听录For more information on the available options and configuration details, see batch transcription.

以下快速入门将指导你完成使用示例。The following quickstart will walk you through a usage sample.

如果希望直入正题,请在 GitHub 上查看或下载所有语音 SDK JavaScript 示例If you prefer to jump right in, view or download all Speech SDK JavaScript Samples on GitHub. 否则就开始吧!Otherwise, let's get started.

先决条件Prerequisites

在开始之前,请务必:Before you get started, make sure to:

新建 JS 文件Create a new JS file

第一步是确保在你喜爱的编辑器中打开项目。The first step is to make sure that you have your project open in your favorite editor.

调用文件 index.js。Call your file index.js.

从一些样本代码入手Start with some boilerplate code

添加一些代码作为项目的框架。Let's add some code that works as a skeleton for our project.

const https = require("https");

// Replace with your subscription key
SubscriptionKey = "YourSubscriptionKey";

// Update with your service region
Region = "YourServiceRegion";
Port = 443;

// Recordings and locale
Locale = "en-US";
RecordingsBlobUri = "YourFileUrl";

// Name and description
Name = "Simple transcription";
Description = "Simple transcription description";

SpeechToTextBasePath = "/api/speechtotext/v2.0/";

需要替换以下值:You'll need to replace the following values:

  • YourSubscriptionKey:在语音资源的 Azure 门户的“密钥” 页中找到YourSubscriptionKey: found in the the Keys page of the Azure portal for the Speech resource
  • YourServiceRegion:在语音资源的 Azure 门户的“概述” 页中找到YourServiceRegion: found in the the Overview page of the Azure portal for the Speech resource
  • YourFileUrl:在存储帐户资源的 Azure 门户的“Blob 服务/容器” 页下找到YourFileUrl: found in under the Blob service / Containers page of the Azure portal for the Storage account resource
    • 选择适当的容器Select the appropriate container
    • 选择所需的 blobSelect the desired blob
    • 在“属性” 页下复制“URL” Copy the URL under the Properties page

JSON 包装器JSON Wrappers

因为 REST API 接受 JSON 格式的请求并返回 JSON 格式的结果。As the REST API's take requests in JSON format and also return results in JSON. 为了使请求和响应更易于理解,我们将声明一些用于对 JSON 进行序列化/反序列化处理的类。In order to make the requests and responses easier to understand, we'll declare a few classes to use for serializing / deserializing the JSON.

class ModelIdentity {
    id;
}

class Transcription {
    Name;
    Description;
    Locale;
    RecordingsUrl;
    ResultsUrls;
    Id;
    CreatedDateTime;
    LastActionDateTime;
    Status;
    StatusMessage;
}

class TranscriptionDefinition {
    Name;
    Description;
    RecordingsUrl;
    Locale;
    Models;
    Properties;
}

创建初始听录请求。Create an initial transcription request.

接下来,我们将生成听录请求。Next, we'll generate the transcription request.

const ts = {
    Name: Name,
    Description: Description,
    Locale: Locale,
    RecordingsUrl: RecordingsBlobUri,
    Properties: {
        "PunctuationMode": "DictatedAndAutomatic",
        "ProfanityFilterMode": "Masked",
        "AddWordLevelTimestamps": "True"
    },
    Models: []
}

const postPayload = JSON.stringify(ts);

const startOptions = {
    hostname: Region + ".cris.ai",
    port: Port,
    path: SpeechToTextBasePath + "Transcriptions/",
    method: "POST",
    headers: {
        "Content-Type": "application/json",
        'Content-Length': postPayload.length,
        "Ocp-Apim-Subscription-Key": SubscriptionKey
    }
}

发送听录请求。Send the transcription request.

现在我们将请求发布到语音服务并检查初始响应代码。Now we post the request to the Speech service and check the initial response code. 此响应代码将仅指示服务是否已收到请求。This response code will simply indicate if the service has received the request. 该服务将在响应标头中返回一个 URL,这是它将存储听录状态的位置。The service will return a Url in the response headers that's the location where it will store the transcription status.

然后,我们将调用方法 CheckTranscriptionStatus 来检查状态并最终输出结果。Then we'll call a method CheckTranscriptionStatus to check on the status and eventually print the results. 接下来,我们将实现 CheckTranscriptionStatusWe'll implement CheckTranscriptionStatus next.

const request = https.request(startOptions, (response) => {
    if (response.statusCode != 202) {
        console.error("Error, status code " + response.statusCode);
    } else {

        const transcriptionLocation = response.headers.location;

        console.info("Created transcription at location " + transcriptionLocation);
        console.info("Checking status.");

        CheckTranscriptionStatus(transcriptionLocation);
    }
});

request.on("error", error => {
    console.error(error);
});

request.write(postPayload);
request.end();

检查请求状态Check the requests status

由于服务以异步方式处理听录,因此需要时常轮询其状态。Since the service processes the transcription asynchronously, we need to poll for its status every so often. 我们每 5 秒查看一次。We'll check every 5 seconds.

通过检索在发布请求时收到的 URL 中的内容,可以查看状态。We can check the status by retrieving the content at the Url we got when the posted the request. 内容返回后,我们将其反序列化为一个帮助程序类,使其便于交互。When we get the content back, we deserialize it into one of our helper class to make it easier to interact with.

下面是一个轮询代码,其中显示了除成功完成之外的所有状态,我们会在下一步完成该操作。Here's the polling code with status display for everything except a successful completion, we'll do that next.

CheckTranscriptionStatus 从听录请求中获取状态 URL,并每 5 秒轮询一次状态 URL,直到它指示成功或失败为止。CheckTranscriptionStatus takes the status URL from the transcription request and polls it every 5 seconds until it indicates success or and error. 然后,它调用 PrintResults 以输出听录结果。It then calls PrintResults to print the results of the transcription. 接下来,我们将实现 PrintResultsWe'll implement PrintResults next.

function CheckTranscriptionStatus(statusUrl) {
    transcription = null;
    const fetchOptions = {
        headers: {
            "Ocp-Apim-Subscription-Key": SubscriptionKey
        }
    }

    const fetchRequest = https.get(new URL(statusUrl), fetchOptions, (response) => {
        if (response.statusCode !== 200) {
            console.info("Error retrieving status: " + response.statusCode);
        } else {
            let responseText = '';
            response.setEncoding('utf8');
            response.on("data", (chunk) => {
                responseText += chunk;
            });

            response.on("end", () => {
                const statusObject = JSON.parse(responseText);

                var done = false;
                switch (statusObject.status) {
                    case "Failed":
                        console.info("Transcription failed. Status: " + transcription.StatusMessage);
                        done = true;
                        break;
                    case "Succeeded":
                        done = true;
                        PrintResults(statusObject.resultsUrls["channel_0"]);
                        break;
                    case "Running":
                        console.info("Transcription is still running.");
                        break;
                    case "NotStarted":
                        console.info("Transcription has not started.");
                        break;
                }

                if (!done) {
                    setTimeout(() => {
                        CheckTranscriptionStatus(statusUrl);
                    }, (5000));
                }
            });
        }
    });

    fetchRequest.on("error", error => {
        console.error(error);
    });
}

显示听录结果Display the transcription results

服务成功完成听录后,结果将存储在可从状态响应中获取的其他 URL 中。Once the service has successfully completed the transcription the results will be stored in another Url that we can get from the status response. 在此,我们先发出请求将这些结果下载到临时文件中,再进行读取和反序列化操作。Here we make a request to download those results in to a temporary file before reading and deserializing them. 加载结果后,可以将其打印到控制台。Once the results are loaded we can print them to the console.

function PrintResults(resultUrl)
{
    const fetchOptions = {
        headers: {
            "Ocp-Apim-Subscription-Key": SubscriptionKey
        }
    }

    const fetchRequest = https.get(new URL(resultUrl), fetchOptions, (response) => {
        if (response.statusCode !== 200) {
            console.info("Error retrieving status: " + response.statusCode);
        } else {
            let responseText = '';
            response.setEncoding('utf8');
            response.on("data", (chunk) => {
                responseText += chunk;
            });

            response.on("end", () => {
                console.info("Transcription Results:");
                console.info(responseText);
            });
        }
    });
}

查看代码Check your code

此时,代码应如下所示:At this point, your code should look like this:

const https = require("https");

// Replace with your subscription key
SubscriptionKey = "YourSubscriptionKey";

// Update with your service region
Region = "YourServiceRegion";
Port = 443;

// Recordings and locale
Locale = "en-US";
RecordingsBlobUri = "YourFileUrl";

// Name and description
Name = "Simple transcription";
Description = "Simple transcription description";

SpeechToTextBasePath = "/api/speechtotext/v2.0/";

class ModelIdentity {
    id;
}

class Transcription {
    Name;
    Description;
    Locale;
    RecordingsUrl;
    ResultsUrls;
    Id;
    CreatedDateTime;
    LastActionDateTime;
    Status;
    StatusMessage;
}

class TranscriptionDefinition {
    Name;
    Description;
    RecordingsUrl;
    Locale;
    Models;
    Properties;
}

const ts = {
    Name: Name,
    Description: Description,
    Locale: Locale,
    RecordingsUrl: RecordingsBlobUri,
    Properties: {
        "PunctuationMode": "DictatedAndAutomatic",
        "ProfanityFilterMode": "Masked",
        "AddWordLevelTimestamps": "True"
    },
    Models: []
}

const postPayload = JSON.stringify(ts);

const startOptions = {
    hostname: Region + ".cris.ai",
    port: Port,
    path: SpeechToTextBasePath + "Transcriptions/",
    method: "POST",
    headers: {
        "Content-Type": "application/json",
        'Content-Length': postPayload.length,
        "Ocp-Apim-Subscription-Key": SubscriptionKey
    }
}

function PrintResults(resultUrl)
{
    const fetchOptions = {
        headers: {
            "Ocp-Apim-Subscription-Key": SubscriptionKey
        }
    }

    const fetchRequest = https.get(new URL(resultUrl), fetchOptions, (response) => {
        if (response.statusCode !== 200) {
            console.info("Error retrieving status: " + response.statusCode);
        } else {
            let responseText = '';
            response.setEncoding('utf8');
            response.on("data", (chunk) => {
                responseText += chunk;
            });

            response.on("end", () => {
                console.info("Transcription Results:");
                console.info(responseText);
            });
        }
    });
}

function CheckTranscriptionStatus(statusUrl) {
    transcription = null;
    const fetchOptions = {
        headers: {
            "Ocp-Apim-Subscription-Key": SubscriptionKey
        }
    }

    const fetchRequest = https.get(new URL(statusUrl), fetchOptions, (response) => {
        if (response.statusCode !== 200) {
            console.info("Error retrieving status: " + response.statusCode);
        } else {
            let responseText = '';
            response.setEncoding('utf8');
            response.on("data", (chunk) => {
                responseText += chunk;
            });

            response.on("end", () => {
                const statusObject = JSON.parse(responseText);

                var done = false;
                switch (statusObject.status) {
                    case "Failed":
                        console.info("Transcription failed. Status: " + transcription.StatusMessage);
                        done = true;
                        break;
                    case "Succeeded":
                        done = true;
                        PrintResults(statusObject.resultsUrls["channel_0"]);
                        break;
                    case "Running":
                        console.info("Transcription is still running.");
                        break;
                    case "NotStarted":
                        console.info("Transcription has not started.");
                        break;
                }

                if (!done) {
                    setTimeout(() => {
                        CheckTranscriptionStatus(statusUrl);
                    }, (5000));
                }
            });
        }
    });

    fetchRequest.on("error", error => {
        console.error(error);
    });
}

const request = https.request(startOptions, (response) => {
    if (response.statusCode != 202) {
        console.error("Error, status code " + response.statusCode);
    } else {

        const transcriptionLocation = response.headers.location;

        console.info("Created transcription at location " + transcriptionLocation);
        console.info("Checking status.");

        CheckTranscriptionStatus(transcriptionLocation);
    }
});

request.on("error", error => {
    console.error(error);
});

request.write(postPayload);
request.end();

运行应用程序Run your app

现在,可以使用语音服务构建应用并测试语音识别。Now you're ready to build your app and test our speech recognition using the Speech service.

启动应用 - 运行节点 index.js。Start your app - Run node index.js.

后续步骤Next steps

查看或下载 GitHub 上所有的语音 SDK 示例View or download all Speech SDK Samples on GitHub.

其他语言和平台支持Additional language and platform support

如果已单击此选项卡,则可能看不到采用你偏好的编程语言的快速入门。If you've clicked this tab, you probably didn't see a quickstart in your favorite programming language. 别担心,我们在 GitHub 上提供了其他快速入门材料和代码示例。Don't worry, we have additional quickstart materials and code samples available on GitHub. 使用表格查找适用于编程语言和平台/OS 组合的相应示例。Use the table to find the right sample for your programming language and platform/OS combination.

重要

需要语音 SDK 版本 1.11.0 或更高版本。Speech SDK version 1.11.0 or later is required.

语言Language 其他快速入门Additional Quickstarts 代码示例Code samples
C#C# 来自麦克风来自文件From mic, From file .NET Framework.NET CoreUWPUnityXamarin.NET Framework, .NET Core, UWP, Unity, Xamarin
C++C++ 来自麦克风来自文件From mic, From file WindowsLinuxmacOSWindows, Linux, macOS
JavaJava 来自麦克风来自文件From mic, From file AndroidJREAndroid, JRE
JavascriptJavaScript 在浏览器上识别来自麦克风的语音在 Node.js 上识别来自文件的语音Browser from mic, Node.js from file Windows、Linux 和 macOSWindows, Linux, macOS
Objective-CObjective-C 在 iOS 上识别来自麦克风的语音在 macOS 上识别来自麦克风的语音iOS from mic, macOS from mic iOSmacOSiOS, macOS
PythonPython 来自麦克风来自文件From mic, From file Windows、Linux 和 macOSWindows, Linux, macOS
SwiftSwift 在 iOS 上识别来自麦克风的语音在 macOS 上识别来自麦克风的语音iOS from mic, macOS from mic iOSmacOSiOS, macOS