快速入门:将语音翻译为文本Quickstart: Translate speech-to-text

重要

需要语音 SDK 1.11.0 或更高版本。Speech SDK version 1.11.0 or later is required.

在本快速入门中,我们将使用语音 SDK 以交互方式将一种语言的语音转换为另一种语言的文本。In this quickstart you will use the Speech SDK to interactively translate speech from one language to text in another language. 在满足几项先决条件后,将语音转换为文本只需五个步骤:After satisfying a few prerequisites, translating speech-to-text only takes five steps:

  • 通过语音主机和订阅密钥创建 SpeechConfig 对象。Create a SpeechConfig object from your speech host and subscription key.
  • 更新 SpeechConfig 对象,指定源语言和目标语言。Update the SpeechConfig object to specify the source and target languages.
  • 使用以上的 SpeechConfig 对象创建 TranslationRecognizer 对象。Create a TranslationRecognizer object using the SpeechConfig object from above.
  • 使用 TranslationRecognizer 对象,开始单个言语的识别过程。Using the TranslationRecognizer object, start the recognition process for a single utterance.
  • 检查返回的 TranslationRecognitionResultInspect the TranslationRecognitionResult returned.

如果希望直入正题,请在 GitHub 上查看或下载所有语音 SDK C# 示例If you prefer to jump right in, view or download all Speech SDK C# Samples on GitHub. 否则就开始吧!Otherwise, let's get started.

选择目标环境Choose your target environment

先决条件Prerequisites

在开始之前,请务必:Before you get started, make sure to:

添加示例代码Add sample code

  1. 打开 Program.cs 并将其中的所有代码替换为以下内容。Open Program.cs, and replace all the code in it with the following.

    using System;
    using System.Threading.Tasks;
    using Microsoft.CognitiveServices.Speech;
    using Microsoft.CognitiveServices.Speech.Translation;
    
    namespace helloworld
    {
        class Program
        {
            public static async Task TranslateSpeechToText()
            {
                // Creates an instance of a speech translation config with specified subscription key and service region.
                // Replace with your own subscription key and region identifier from here: https://docs.azure.cn/cognitive-services/speech-service/regions
                var config = SpeechTranslationConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");
    
                // Sets source and target languages.
                // Replace with the languages of your choice, from list found here: https://docs.azure.cn/cognitive-services/speech-service/language-support#speech-translation
                string fromLanguage = "en-US";
                string toLanguage = "de";
                config.SpeechRecognitionLanguage = fromLanguage;
                config.AddTargetLanguage(toLanguage);
    
                // Creates a translation recognizer using the default microphone audio input device.
                using (var recognizer = new TranslationRecognizer(config))
                {
                    // Starts translation, and returns after a single utterance is recognized. The end of a
                    // single utterance is determined by listening for silence at the end or until a maximum of 15
                    // seconds of audio is processed. The task returns the recognized text as well as the translation.
                    // Note: Since RecognizeOnceAsync() returns only a single utterance, it is suitable only for single
                    // shot recognition like command or query.
                    // For long-running multi-utterance recognition, use StartContinuousRecognitionAsync() instead.
                    Console.WriteLine("Say something...");
                    var result = await recognizer.RecognizeOnceAsync();
    
                    // Checks result.
                    if (result.Reason == ResultReason.TranslatedSpeech)
                    {
                        Console.WriteLine($"RECOGNIZED '{fromLanguage}': {result.Text}");
                        Console.WriteLine($"TRANSLATED into '{toLanguage}': {result.Translations[toLanguage]}");
                    }
                    else if (result.Reason == ResultReason.RecognizedSpeech)
                    {
                        Console.WriteLine($"RECOGNIZED '{fromLanguage}': {result.Text} (text could not be translated)");
                    }
                    else if (result.Reason == ResultReason.NoMatch)
                    {
                        Console.WriteLine($"NOMATCH: Speech could not be recognized.");
                    }
                    else if (result.Reason == ResultReason.Canceled)
                    {
                        var cancellation = CancellationDetails.FromResult(result);
                        Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");
    
                        if (cancellation.Reason == CancellationReason.Error)
                        {
                            Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
                            Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
                            Console.WriteLine($"CANCELED: Did you update the subscription info?");
                        }
                    }
                }
            }
    
            static void Main(string[] args)
            {
                TranslateSpeechToText().Wait();
            }
        }
    }
    
  2. 在同一文件中,将字符串 YourSubscriptionKey 替换为你的订阅密钥。In the same file, replace the string YourSubscriptionKey with your subscription key.

  3. 将字符串 YourServiceRegion 替换为与订阅关联的区域(例如,对于试用订阅,为 chinaeast2)。Replace the string YourServiceRegion with the region associated with your subscription (for example, chinaeast2 for the trial subscription).

  4. 在菜单栏中,选择“文件”**** > “全部保存”****。From the menu bar, choose File > Save All.

生成并运行应用程序Build and run the application

  1. 从菜单栏中,选择“构建”**** > “构建解决方案”**** 以构建应用程序。From the menu bar, select Build > Build Solution to build the application. 现在,编译代码时应不会提示错误。The code should compile without errors now.

  2. 选择“调试”**** > “开始调试”****(或按 F5****)以启动 helloworld**** 应用程序。Choose Debug > Start Debugging (or press F5) to start the helloworld application.

  3. 说一个英语短语或句子。Speak an English phrase or sentence. 应用程序将语音传输到语音服务,该服务会将语音翻译并转录为文本(在本例中为德语)。The application transmits your speech to the Speech service, which translates and transcribes to text (in this case, to German). 然后,语音服务会将该文本发送回应用程序以供显示。The Speech service then sends the text back to the application for display.

Say something...
RECOGNIZED 'en-US': What's the weather in Seattle?
TRANSLATED into 'de': Wie ist das Wetter in Seattle?

后续步骤Next steps

在本快速入门中,我们将使用语音 SDK 以交互方式将一种语言的语音转换为另一种语言的文本。In this quickstart you will use the Speech SDK to interactively translate speech from one language to text in another language. 在满足几项先决条件后,将语音转换为文本只需五个步骤:After satisfying a few prerequisites, translating speech-to-text only takes five steps:

  • 通过语音主机和订阅密钥创建 SpeechConfig 对象。Create a SpeechConfig object from your speech host and subscription key.
  • 更新 SpeechConfig 对象,指定源语言和目标语言。Update the SpeechConfig object to specify the source and target languages.
  • 使用以上的 SpeechConfig 对象创建 TranslationRecognizer 对象。Create a TranslationRecognizer object using the SpeechConfig object from above.
  • 使用 TranslationRecognizer 对象,开始单个言语的识别过程。Using the TranslationRecognizer object, start the recognition process for a single utterance.
  • 检查返回的 TranslationRecognitionResultInspect the TranslationRecognitionResult returned.

如果希望直入正题,请在 GitHub 上查看或下载所有语音 SDK C++ 示例If you prefer to jump right in, view or download all Speech SDK C++ Samples on GitHub. 否则就开始吧!Otherwise, let's get started.

先决条件Prerequisites

在开始之前,请务必:Before you get started, make sure to:

添加示例代码Add sample code

  1. 打开源文件 helloworld.cppOpen the source file helloworld.cpp.

  2. 将所有代码替换为以下片段:Replace all the code with the following snippet:

    #include <iostream>
    #include <vector>
    #include <speechapi_cxx.h>
    
    using namespace std;
    using namespace Microsoft::CognitiveServices::Speech;
    using namespace Microsoft::CognitiveServices::Speech::Translation;
    
    void TranslateSpeechToText()
    {
        // Creates an instance of a speech translation config with specified subscription key and service region.
        // Replace with your own subscription key and region identifier from here: https://docs.azure.cn/cognitive-services/speech-service/regions
        auto config = SpeechTranslationConfig::FromSubscription("YourSubscriptionKey", "YourServiceRegion");
    
        // Sets source and target languages.
        // Replace with the languages of your choice, from list found here: https://docs.azure.cn/cognitive-services/speech-service/language-support#speech-translation
        auto fromLanguage = "en-US";
        auto toLanguage = "de";
        config->SetSpeechRecognitionLanguage(fromLanguage);
        config->AddTargetLanguage(toLanguage);
    
        // Creates a translation recognizer using the default microphone audio input device.
        auto recognizer = TranslationRecognizer::FromConfig(config);
    
        // Starts translation, and returns after a single utterance is recognized. The end of a
        // single utterance is determined by listening for silence at the end or until a maximum of 15
        // seconds of audio is processed. The task returns the recognized text as well as the translation.
        // Note: Since RecognizeOnceAsync() returns only a single utterance, it is suitable only for single
        // shot recognition like command or query.
        // For long-running multi-utterance recognition, use StartContinuousRecognitionAsync() instead.
        cout << "Say something...\n";
        auto result = recognizer->RecognizeOnceAsync().get();
    
        // Checks result.
        if (result->Reason == ResultReason::TranslatedSpeech)
        {
            cout << "RECOGNIZED '" << fromLanguage << "': " << result->Text << std::endl;
            cout << "TRANSLATED into '" << toLanguage << "': " << result->Translations.at(toLanguage) << std::endl;
        }
        else if (result->Reason == ResultReason::RecognizedSpeech)
        {
            cout << "RECOGNIZED '" << fromLanguage << "' " << result->Text << " (text could not be translated)" << std::endl;
        }
        else if (result->Reason == ResultReason::NoMatch)
        {
            cout << "NOMATCH: Speech could not be recognized." << std::endl;
        }
        else if (result->Reason == ResultReason::Canceled)
        {
            auto cancellation = CancellationDetails::FromResult(result);
            cout << "CANCELED: Reason=" << (int)cancellation->Reason << std::endl;
    
            if (cancellation->Reason == CancellationReason::Error)
            {
                cout << "CANCELED: ErrorCode=" << (int)cancellation->ErrorCode << std::endl;
                cout << "CANCELED: ErrorDetails=" << cancellation->ErrorDetails << std::endl;
                cout << "CANCELED: Did you update the subscription info?" << std::endl;
            }
        }
    }
    
    int wmain()
    {
        TranslateSpeechToText();
        return 0;
    }
    
  3. 在同一文件中,将字符串 YourSubscriptionKey 替换为你的订阅密钥。In the same file, replace the string YourSubscriptionKey with your subscription key.

  4. 将字符串 YourServiceRegion 替换为与订阅关联的区域(例如,对于试用订阅,为 chinaeast2)。Replace the string YourServiceRegion with the region associated with your subscription (for example, chinaeast2 for the trial subscription).

  5. 在菜单栏中,选择“文件” > “全部保存”。From the menu bar, choose File > Save All.

生成并运行应用程序Build and run the application

  1. 从菜单栏中,选择“构建” > “构建解决方案”以构建应用程序。From the menu bar, select Build > Build Solution to build the application. 现在,编译代码时应不会提示错误。The code should compile without errors now.

  2. 选择“调试” > “开始调试”(或按 F5)以启动 helloworld 应用程序。Choose Debug > Start Debugging (or press F5) to start the helloworld application.

  3. 说一个英语短语或句子。Speak an English phrase or sentence. 应用程序将语音传输到语音服务,该服务会将语音翻译并转录为文本(在本例中为德语)。The application transmits your speech to the Speech service, which translates and transcribes to text (in this case, to German). 然后,语音服务会将该文本发送回应用程序以供显示。The Speech service then sends the text back to the application for display.

Say something...
RECOGNIZED 'en-US': What's the weather in Seattle?
TRANSLATED into 'de': Wie ist das Wetter in Seattle?

后续步骤Next steps


在本快速入门中,我们将使用语音 SDK 以交互方式将一种语言的语音转换为另一种语言的文本。In this quickstart you will use the Speech SDK to interactively translate speech from one language to text in another language. 在满足几项先决条件后,将语音转换为文本只需五个步骤:After satisfying a few prerequisites, translating speech-to-text only takes five steps:

  • 通过语音主机和订阅密钥创建 SpeechConfig 对象。Create a SpeechConfig object from your speech host and subscription key.
  • 更新 SpeechConfig 对象,指定源语言和目标语言。Update the SpeechConfig object to specify the source and target languages.
  • 使用以上的 SpeechConfig 对象创建 TranslationRecognizer 对象。Create a TranslationRecognizer object using the SpeechConfig object from above.
  • 使用 TranslationRecognizer 对象,开始单个言语的识别过程。Using the TranslationRecognizer object, start the recognition process for a single utterance.
  • 检查返回的 TranslationRecognitionResultInspect the TranslationRecognitionResult returned.

如果希望直入正题,请在 GitHub 上查看或下载所有语音 SDK Java 示例If you prefer to jump right in, view or download all Speech SDK Java Samples on GitHub. 否则就开始吧!Otherwise, let's get started.

先决条件Prerequisites

在开始之前,请务必:Before you get started, make sure to:

添加示例代码Add sample code

  1. 若要向 Java 项目添加新的空类,请选择“文件” > “新建” > “类”。**** **** ****To add a new empty class to your Java project, select File > New > Class.

  2. 在“新建 Java 类”窗口中,在“包”字段内输入 speechsdk.quickstart,在“名称”字段内输入 Main。**** **** ****In the New Java Class window, enter speechsdk.quickstart into the Package field, and Main into the Name field.

    “新建 Java 类”窗口的屏幕截图

  3. Main.java 中的所有代码替换为以下代码片段:Replace all code in Main.java with the following snippet:

    package quickstart;
    
    import java.io.IOException;
    import java.util.concurrent.Future;
    import java.util.concurrent.ExecutionException;
    import com.microsoft.cognitiveservices.speech.*;
    import com.microsoft.cognitiveservices.speech.translation.*;
    
    public class Main {
    
        public static void translationWithMicrophoneAsync() throws InterruptedException, ExecutionException, IOException
        {
            // Creates an instance of a speech translation config with specified
            // host and subscription key. Replace with your own subscription key
            // and region identifier from here: https://docs.azure.cn/cognitive-services/speech-service/regions
    
            int exitCode = 1;
            SpeechTranslationConfig config = SpeechTranslationConfig.fromSubscription("YourSubscriptionKey", "YourServiceRegion");
            assert(config != null);
    
            // Sets source and target languages.
            String fromLanguage = "en-US";
            String toLanguage = "de";
            config.setSpeechRecognitionLanguage(fromLanguage);
            config.addTargetLanguage(toLanguage);
    
            // Creates a translation recognizer using the default microphone audio input device.
            TranslationRecognizer recognizer = new TranslationRecognizer(config);
            assert(recognizer != null);
    
            System.out.println("Say something...");
    
            // Starts translation, and returns after a single utterance is recognized. The end of a
            // single utterance is determined by listening for silence at the end or until a maximum of 15
            // seconds of audio is processed. The task returns the recognized text as well as the translation.
            // Note: Since recognizeOnceAsync() returns only a single utterance, it is suitable only for single
            // shot recognition like command or query.
            // For long-running multi-utterance recognition, use startContinuousRecognitionAsync() instead.
            Future<TranslationRecognitionResult> task = recognizer.recognizeOnceAsync();
            assert(task != null);
    
            TranslationRecognitionResult result = task.get();
            assert(result != null);
    
            if (result.getReason() == ResultReason.TranslatedSpeech) {
                System.out.println("RECOGNIZED '" + fromLanguage + "': " + result.getText());
                System.out.println("TRANSLATED into '" + toLanguage + "': " + result.getTranslations().get(toLanguage));
                exitCode = 0;
            }
            else if (result.getReason() == ResultReason.RecognizedSpeech) {
                System.out.println("RECOGNIZED '" + fromLanguage + "': " + result.getText() + "(text could not be translated)");
                exitCode = 0;
            }
            else if (result.getReason() == ResultReason.NoMatch) {
                System.out.println("NOMATCH: Speech could not be recognized.");
            }
            else if (result.getReason() == ResultReason.Canceled) {
                CancellationDetails cancellation = CancellationDetails.fromResult(result);
                System.out.println("CANCELED: Reason=" + cancellation.getReason());
    
                if (cancellation.getReason() == CancellationReason.Error) {
                    System.out.println("CANCELED: ErrorCode=" + cancellation.getErrorCode());
                    System.out.println("CANCELED: ErrorDetails=" + cancellation.getErrorDetails());
                    System.out.println("CANCELED: Did you update the subscription info?");
                }
            }
    
            recognizer.close();
    
            System.exit(exitCode);
        }
    
        public static void main(String[] args) {
            try {
                translationWithMicrophoneAsync();
            } catch (Exception ex) {
                System.out.println("Unexpected exception: " + ex.getMessage());
                assert(false);
                System.exit(1);
            }
        }
    }
    
  4. 将字符串 YourSubscriptionKey 替换为你的订阅密钥。Replace the string YourSubscriptionKey with your subscription key.

  5. 将字符串 YourServiceRegion 替换为与订阅关联的区域(例如,对于试用订阅,为 chinaeast2)。Replace the string YourServiceRegion with the region associated with your subscription (for example, chinaeast2 for the trial subscription).

  6. 保存对项目的更改。Save changes to the project.

生成并运行应用Build and run the app

按 F11,或选择“运行” > “调试”。**** ****Press F11, or select Run > Debug.

  1. 说一个英语短语或句子。Speak an English phrase or sentence. 应用程序将语音传输到语音服务,该服务会将语音翻译并转录为文本(在本例中为德语)。The application transmits your speech to the Speech service, which translates and transcribes to text (in this case, to German). 然后,语音服务会将该文本发送回应用程序以供显示。The Speech service then sends the text back to the application for display.
Say something...
RECOGNIZED 'en-US': What's the weather in Seattle?
TRANSLATED into 'de': Wie ist das Wetter in Seattle?

后续步骤Next steps


在本快速入门中,我们将使用语音 SDK 以交互方式将一种语言的语音转换为另一种语言的文本。In this quickstart you will use the Speech SDK to interactively translate speech from one language to text in another language. 在满足几项先决条件后,将语音转换为文本只需五个步骤:After satisfying a few prerequisites, translating speech-to-text only takes five steps:

  • 通过语音主机和订阅密钥创建 SpeechConfig 对象。Create a SpeechConfig object from your speech host and subscription key.
  • 更新 SpeechConfig 对象,指定源语言和目标语言。Update the SpeechConfig object to specify the source and target languages.
  • 使用以上的 SpeechConfig 对象创建 TranslationRecognizer 对象。Create a TranslationRecognizer object using the SpeechConfig object from above.
  • 使用 TranslationRecognizer 对象,开始单个言语的识别过程。Using the TranslationRecognizer object, start the recognition process for a single utterance.
  • 检查返回的 TranslationRecognitionResultInspect the TranslationRecognitionResult returned.

如果希望直入正题,请在 GitHub 上查看或下载所有语音 SDK Python 示例If you prefer to jump right in, view or download all Speech SDK Python Samples on GitHub. 否则就开始吧!Otherwise, let's get started.

先决条件Prerequisites

在开始之前,请务必:Before you get started, make sure to:

添加示例代码Add sample code

  1. 打开 quickstart.py 并将其中的所有代码替换为以下内容。Open quickstart.py, and replace all the code in it with the following.

    import azure.cognitiveservices.speech as speechsdk
    
    speech_host, speech_key = "wss://YourServiceRegion.stt.speech.azure.cn/", "YourSubscriptionKey"
    
    def translate_speech_to_text():
    
        # Creates an instance of a speech translation config with specified host and subscription key.
        # Replace with your own subscription key and region identifier from here: https://docs.azure.cn/cognitive-services/speech-service/regions
        translation_config = speechsdk.translation.SpeechTranslationConfig(host=speech_host, subscription=speech_key)
    
        # Sets source and target languages.
        # Replace with the languages of your choice, from list found here: https://docs.azure.cn/cognitive-services/speech-service/language-support#speech-translation
        fromLanguage = 'en-US'
        toLanguage = 'de'
        translation_config.speech_recognition_language = fromLanguage
        translation_config.add_target_language(toLanguage)
    
        # Creates a translation recognizer using and audio file as input.
        recognizer = speechsdk.translation.TranslationRecognizer(translation_config=translation_config)
    
        # Starts translation, and returns after a single utterance is recognized. The end of a
        # single utterance is determined by listening for silence at the end or until a maximum of 15
        # seconds of audio is processed. It returns the recognized text as well as the translation.
        # Note: Since recognize_once() returns only a single utterance, it is suitable only for single
        # shot recognition like command or query.
        # For long-running multi-utterance recognition, use start_continuous_recognition() instead.
        print("Say something...")
        result = recognizer.recognize_once()
    
        # Check the result
        if result.reason == speechsdk.ResultReason.TranslatedSpeech:
            print("RECOGNIZED '{}': {}".format(fromLanguage, result.text))
            print("TRANSLATED into {}: {}".format(toLanguage, result.translations['de']))
        elif result.reason == speechsdk.ResultReason.RecognizedSpeech:
            print("RECOGNIZED: {} (text could not be translated)".format(result.text))
        elif result.reason == speechsdk.ResultReason.NoMatch:
            print("NOMATCH: Speech could not be recognized: {}".format(result.no_match_details))
        elif result.reason == speechsdk.ResultReason.Canceled:
            print("CANCELED: Reason={}".format(result.cancellation_details.reason))
            if result.cancellation_details.reason == speechsdk.CancellationReason.Error:
                print("CANCELED: ErrorDetails={}".format(result.cancellation_details.error_details))
    
    translate_speech_to_text()
    
  2. 在同一文件中,将字符串 YourSubscriptionKey 替换为你的订阅密钥。In the same file, replace the string YourSubscriptionKey with your subscription key.

  3. 将字符串 YourServiceRegion 替换为与订阅关联的区域(例如,对于试用订阅,为 chinaeast2)。Replace the string YourServiceRegion with the region associated with your subscription (for example, chinaeast2 for the trial subscription).

  4. 保存对 quickstart.py 所做的更改。Save the changes you've made to quickstart.py.

生成并运行应用Build and run your app

  1. 通过控制台或 IDE 运行示例:Run the sample from the console or in your IDE:

    python quickstart.py
    
  2. 说一个英语短语或句子。Speak an English phrase or sentence. 应用程序将语音传输到语音服务,该服务会将语音翻译并转录为文本(在本例中为德语)。The application transmits your speech to the Speech service, which translates and transcribes to text (in this case, to German). 然后,语音服务会将该文本发送回应用程序以供显示。The Speech service then sends the text back to the application for display.

    Say something...
    RECOGNIZED 'en-US': What's the weather in Seattle?
    TRANSLATED into 'de': Wie ist das Wetter in Seattle?
    

后续步骤Next steps

在本快速入门中,我们将使用语音 SDK 以交互方式将一种语言的语音转换为另一种语言的文本。In this quickstart you will use the Speech SDK to interactively translate speech from one language to text in another language. 在满足几项先决条件后,将语音转换为文本只需五个步骤:After satisfying a few prerequisites, translating speech-to-text only takes five steps:

  • 通过语音主机和订阅密钥创建 SpeechConfig 对象。Create a SpeechConfig object from your speech host and subscription key.
  • 更新 SpeechConfig 对象,指定源语言和目标语言。Update the SpeechConfig object to specify the source and target languages.
  • 使用以上的 SpeechConfig 对象创建 TranslationRecognizer 对象。Create a TranslationRecognizer object using the SpeechConfig object from above.
  • 使用 TranslationRecognizer 对象,开始单个言语的识别过程。Using the TranslationRecognizer object, start the recognition process for a single utterance.
  • 检查返回的 TranslationRecognitionResultInspect the TranslationRecognitionResult returned.

如果希望直入正题,请在 GitHub 上查看或下载所有语音 SDK JavaScript 示例If you prefer to jump right in, view or download all Speech SDK JavaScript Samples on GitHub. 否则就开始吧!Otherwise, let's get started.

先决条件Prerequisites

准备工作:Before you get started:

新建网站文件夹Create a new Website folder

新建空文件夹。Create a new, empty folder. 如果要在 web 服务器上承载示例,请确保 web 服务器可访问文件夹。In case you want to host the sample on a web server, make sure that the web server can access the folder.

将 JavaScript 的语音 SDK 解压缩到文件夹Unpack the Speech SDK for JavaScript into that folder

将语音 SDK 作为 .zip 包下载,并将其解压缩到新建文件夹。Download the Speech SDK as a .zip package and unpack it into the newly created folder. 这导致两个文件(microsoft.cognitiveservices.speech.sdk.bundle.jsmicrosoft.cognitiveservices.speech.sdk.bundle.js.map)被解压缩。This results in two files being unpacked, microsoft.cognitiveservices.speech.sdk.bundle.js and microsoft.cognitiveservices.speech.sdk.bundle.js.map. 后一个文件是可选的,可用于调试到 SDK 代码中。The latter file is optional, and is useful for debugging into the SDK code.

创建 index.html 页面Create an index.html page

在文件夹中创建名为 index.html 的新文件,使用文本编辑器打开此文件。Create a new file in the folder, named index.html and open this file with a text editor.

  1. 创建以下 HTML 框架:Create the following HTML skeleton:

    <!DOCTYPE html>
    <html>
    <head>
      <title>Microsoft Cognitive Services Speech SDK JavaScript Quickstart</title>
      <meta charset="utf-8" />
    </head>
    <body style="font-family:'Helvetica Neue',Helvetica,Arial,sans-serif; font-size:13px;">
      <!-- <uidiv> -->
      <div id="warning">
        <h1 style="font-weight:500;">Speech Recognition Speech SDK not found (microsoft.cognitiveservices.speech.sdk.bundle.js missing).</h1>
      </div>
    
      <div id="content" style="display:none">
        <table width="100%">
          <tr>
            <td></td>
            <td><h1 style="font-weight:500;">Microsoft Cognitive Services Speech SDK JavaScript Quickstart</h1></td>
          </tr>
          <tr>
            <td align="right"><a href="https://docs.azure.cn/cognitive-services/speech-service/get-started" target="_blank">Subscription</a>:</td>
            <td><input id="subscriptionKey" type="text" size="40" value="subscription"></td>
          </tr>
          <tr>
            <td align="right">Region</td>
            <td><input id="serviceRegion" type="text" size="40" value="YourServiceRegion"></td>
          </tr>
          <tr>
            <td align="right">Source Language</td>
            <td><select id="languageSourceOptions">
                <option value="ar-EG">Arabic - EG</option>
                <option selected="selected" value="de-DE">German - DE</option>
                <option value="en-US">English - US</option>
                <option value="es-ES">Spanish - ES</option>
                <option value="fr-FR">French - FR</option>
                <option value="hi-IN">Hindi - IN</option>
                <option value="ja-JP">Japanese - JP</option>
                <option value="ko-KR">Korean - KR</option>
                <option value="ru-RU">Russian - RU</option>
                <option value="zh-CN">Chinese - CN</option>
            </select></td>
          </tr>
          <tr>
            <td align="right">Target Language</td>
            <td><select id="languageTargetOptions">
                <option value="ar-EG">Arabic - EG</option>
                <option selected="selected" value="de-DE">German - DE</option>
                <option value="en-US">English - US</option>
                <option value="es-ES">Spanish - ES</option>
                <option value="fr-FR">French - FR</option>
                <option value="hi-IN">Hindi - IN</option>
                <option value="ja-JP">Japanese - JP</option>
                <option value="ko-KR">Korean - KR</option>
                <option value="ru-RU">Russian - RU</option>
                <option value="zh-CN">Chinese - CN</option>
            </select></td>
          </tr>
          <tr>
            <td></td>
            <td><button id="startRecognizeOnceAsyncButton">Start recognition</button></td>
          </tr>
          <tr>
            <td align="right" valign="top">Results</td>
            <td><textarea id="phraseDiv" style="display: inline-block;width:500px;height:200px"></textarea></td>
          </tr>
        </table>
      </div>
      <!-- </uidiv> -->
    
      <!-- <speechsdkdiv> -->
      <!-- Speech SDK reference sdk. -->
      <script src="microsoft.cognitiveservices.speech.sdk.bundle.js"></script>
      <!-- </speechsdkdiv> -->
    
      <!-- <authorizationfunction> -->
      <!-- Speech SDK Authorization token -->
      <script>
      // Note: Replace the URL with a valid endpoint to retrieve
      //       authorization tokens for your subscription.
      var authorizationEndpoint = "token.php";
    
      function RequestAuthorizationToken() {
        if (authorizationEndpoint) {
          var a = new XMLHttpRequest();
          a.open("GET", authorizationEndpoint);
          a.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
          a.send("");
          a.onload = function() {
            var token = JSON.parse(atob(this.responseText.split(".")[1]));
            serviceRegion.value = token.region;
            authorizationToken = this.responseText;
            subscriptionKey.disabled = true;
            subscriptionKey.value = "using authorization token (hit F5 to refresh)";
            console.log("Got an authorization token: " + token);
          }
        }
      }
      </script>
      <!-- </authorizationfunction> -->
    
      <!-- <quickstartcode> -->
      <!-- Speech SDK USAGE -->
      <script>
        // status fields and start button in UI
        var phraseDiv;
        var startRecognizeOnceAsyncButton;
    
        // subscription key and region for speech services.
        var subscriptionKey, serviceRegion, languageTargetOptions, languageSourceOptions;
        var authorizationToken;
        var SpeechSDK;
        var recognizer;
    
        document.addEventListener("DOMContentLoaded", function () {
          startRecognizeOnceAsyncButton = document.getElementById("startRecognizeOnceAsyncButton");
          subscriptionKey = document.getElementById("subscriptionKey");
          serviceRegion = document.getElementById("serviceRegion");
          languageTargetOptions = document.getElementById("languageTargetOptions");
          languageSourceOptions = document.getElementById("languageSourceOptions");
          phraseDiv = document.getElementById("phraseDiv");
    
          startRecognizeOnceAsyncButton.addEventListener("click", function () {
            startRecognizeOnceAsyncButton.disabled = true;
            phraseDiv.innerHTML = "";
    
            // if we got an authorization token, use the token. Otherwise use the provided subscription key
            var speechConfig;
            if (authorizationToken) {
              speechConfig = SpeechSDK.SpeechTranslationConfig.fromAuthorizationToken(authorizationToken, serviceRegion.value);
            } else {
              if (subscriptionKey.value === "" || subscriptionKey.value === "subscription") {
                alert("Please enter your Microsoft Cognitive Services Speech subscription key!");
                startRecognizeOnceAsyncButton.disabled = false;
                return;
              }
              speechConfig = SpeechSDK.SpeechTranslationConfig.fromSubscription(subscriptionKey.value, serviceRegion.value);
            }
    
            speechConfig.speechRecognitionLanguage = languageSourceOptions.value;
            let language = languageTargetOptions.value
            speechConfig.addTargetLanguage(language)
    
            var audioConfig  = SpeechSDK.AudioConfig.fromDefaultMicrophoneInput();
            recognizer = new SpeechSDK.TranslationRecognizer(speechConfig, audioConfig);
    
            recognizer.recognizeOnceAsync(
              function (result) {
                startRecognizeOnceAsyncButton.disabled = false;
                let languageKey = language.substring(0,2)
                let translation = result.translations.get(languageKey);
                window.console.log(translation);
                phraseDiv.innerHTML += translation;
    
                recognizer.close();
                recognizer = undefined;
              },
              function (err) {
                startRecognizeOnceAsyncButton.disabled = false;
                phraseDiv.innerHTML += err;
                window.console.log(err);
    
                recognizer.close();
                recognizer = undefined;
              });
          });
    
          if (!!window.SpeechSDK) {
            SpeechSDK = window.SpeechSDK;
            startRecognizeOnceAsyncButton.disabled = false;
    
            document.getElementById('content').style.display = 'block';
            document.getElementById('warning').style.display = 'none';
    
            // in case we have a function for getting an authorization token, call it.
            if (typeof RequestAuthorizationToken === "function") {
              RequestAuthorizationToken();
            }
          }
        });
      </script>
      <!-- </quickstartcode> -->
    </body>
    </html>
    

创建令牌源(可选)Create the token source (optional)

如果要在 web 服务器上承载网页,可以为演示应用程序提供令牌源。In case you want to host the web page on a web server, you can optionally provide a token source for your demo application. 这样一来,订阅密钥永远不会离开服务器,并且用户可以在不输入任何授权代码的情况下使用语音功能。That way, your subscription key will never leave your server while allowing users to use speech capabilities without entering any authorization code themselves.

创建名为 token.php 的新文件。Create a new file named token.php. 此示例假设 web 服务器支持 PHP 脚本语言。In this example we assume your web server supports the PHP scripting language. 输入以下代码:Enter the following code:

<?php
header('Access-Control-Allow-Origin: ' . $_SERVER['SERVER_NAME']);

// Replace with your own subscription key and service region (e.g., "chinaeast2").
$subscriptionKey = 'YourSubscriptionKey';
$region = 'YourServiceRegion';

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://' . $region . '.api.cognitive.azure.cn/sts/v1.0/issueToken');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, '{}');
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: application/json', 'Ocp-Apim-Subscription-Key: ' . $subscriptionKey));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
echo curl_exec($ch);
?>

备注

授权令牌仅具有有限的生存期。Authorization tokens only have a limited lifetime. 此简化示例不显示如何自动刷新授权令牌。This simplified example does not show how to refresh authorization tokens automatically. 作为用户,你可以手动重载页面或点击 F5 刷新。As a user, you can manually reload the page or hit F5 to refresh.

在本地生成和运行示例Build and run the sample locally

要启动应用,双击 index.html 文件或使用你喜欢的 web 浏览器打开 index.html。To launch the app, double-click on the index.html file or open index.html with your favorite web browser. 它会提供一个简单的 GUI,允许你输入订阅密钥和区域,并触发输入文本的合成。It will present a simple GUI allowing you to enter your subscription key and region and trigger synthesis of the input text.

通过 web 服务器生成并运行示例Build and run the sample via a web server

若要启动应用,请打开你喜爱的 Web 浏览器并将其指向托管文件夹的公共 URL,输入区域,然后触发输入文本的合成。To launch your app, open your favorite web browser and point it to the public URL that you host the folder on, enter your region, and trigger synthesis of the input text. 配置后,它将获取令牌源中的令牌。If configured, it will acquire a token from your token source.

后续步骤Next steps

在本快速入门中,你将使用命令行中的语音 CLI 将麦克风输入中的语音转换为另一种语言的文本。In this quickstart, you use the Speech CLI from the command line to convert speech from a microphone input to text in another language. 经过一次性配置后,可以通过语音 CLI 使用命令行中的命令来转换语音。After a one-time configuration, the Speech CLI lets you translate speech using commands from the command line.

先决条件Prerequisites

唯一先决条件是要有一个 Azure 语音订阅。The only prerequisite is an Azure Speech subscription. 如果还没有订阅,请参阅指南了解如何新建订阅。See the guide on creating a new subscription if you don't already have one.

下载并安装Download and install

按照以下步骤在 Windows 上安装语音 CLI:Follow these steps to install the Speech CLI on Windows:

  1. 安装 .NET Framework 4.7.NET Core 3.0Install either .NET Framework 4.7 or .NET Core 3.0
  2. 下载语音 CLI zip 存档然后提取它。Download the Speech CLI zip archive, then extract it.
  3. 转到从下载中提取的根目录 spx-zips,并提取所需的子目录(spx-net471 用于 .NET Framework 4.7,spx-netcore-win-x64 用于 x64 CPU 上的 .NET Core 3.0)。Go to the root directory spx-zips that you extracted from the download, and extract the subdirectory that you need (spx-net471 for .NET Framework 4.7, or spx-netcore-win-x64 for .NET Core 3.0 on an x64 CPU).

在命令提示符中,将目录更改到此位置,然后键入 spx 查看语音 CLI 的帮助。In the command prompt, change directory to this location, and then type spx to see help for the Speech CLI.

创建订阅配置Create subscription config

若要开始使用语音 CLI,首先需要输入语音订阅密钥和区域信息。To start using the Speech CLI, you first need to enter your Speech subscription key and region information. 请查看区域支持页,找到你的区域标识符。See the region support page to find your region identifier. 获得订阅密钥和区域标识符后(例如Once you have your subscription key and region identifier (ex. chinaeast2),运行以下命令。chinaeast2), run the following commands.

spx config @key --set YOUR-SUBSCRIPTION-KEY
spx config @region --set YOUR-REGION-ID

现在会存储订阅身份验证,用于将来的 SPX 请求。Your subscription authentication is now stored for future SPX requests. 如果需要删除这些已存储值中的任何一个,请运行 spx config @region --clearspx config @key --clearIf you need to remove either of these stored values, run spx config @region --clear or spx config @key --clear.

运行语音 CLIRun the Speech CLI

现在,可以运行语音 CLI,将语音转换为采用不同语言的文本。Now you're ready to run the Speech CLI to translate speech into text in a different language.

在命令行中,更改为包含语音 CLI 二进制文件的目录,然后键入:From the command line, change to the directory that contains the Speech CLI binary file, and type:

spx translate --microphone --target de-DE

语音 CLI 会将自然语言英语口语转换为采用德语输出的文本。The Speech CLI will translate natural language spoken English into text printed in German. 按 ENTER 停止该工具。Press ENTER to stop the tool.

备注

语音 CLI 默认为英语。The Speech CLI defaults to English. 你可以从“语音转文本”表中选择不同语言。You can choose a different language from the Speech-to-text table. 例如,添加 --source ja-JP 以识别日语语音。For example, add --source ja-JP to recognize Japanese speech.

后续步骤Next steps

继续浏览基础知识,了解语音 CLI 的其他功能。Continue exploring the basics to learn about other features of the Speech CLI.

查看或下载 GitHub 上所有的语音 SDK 示例View or download all Speech SDK Samples on GitHub.

其他语言和平台支持Additional language and platform support

重要

需要语音 SDK 版本 1.11.0 或更高版本。Speech SDK version 1.11.0 or later is required.

如果已单击此选项卡,则可能看不到采用你偏好的编程语言的快速入门。If you've clicked this tab, you probably didn't see a quickstart in your favorite programming language. 别担心,我们在 GitHub 上提供了其他快速入门材料和代码示例。Don't worry, we have additional quickstart materials and code samples available on GitHub. 使用表格查找适用于编程语言和平台/OS 组合的相应示例。Use the table to find the right sample for your programming language and platform/OS combination.

语言Language 代码示例Code samples
C#C# .NET Framework.NET CoreUWPUnityXamarin.NET Framework, .NET Core, UWP, Unity, Xamarin
C++C++ WindowsLinuxmacOSWindows, Linux, macOS
JavaJava AndroidJREAndroid, JRE
JavascriptJavaScript Browser、Node.jsBrowser, Node.js
Objective-CObjective-C iOSmacOSiOS, macOS
PythonPython Windows、Linux 和 macOSWindows, Linux, macOS
SwiftSwift iOSmacOSiOS, macOS