实现语言识别

语言识别通过将音频中的语言与支持的语言列表进行比较，自动识别其语言。

语言标识 (LID) 用例包括：

语音转文本识别，用于需要识别音频源中的语言并将其转录为文本时。
语音翻译，用于需要识别音频源中的语言并将其翻译为另一种语言时。

对于语音识别，语言标识的初始延迟较高。你应只在需要时添加此可选功能。

设置配置选项

无论是将语言识别与语音转文本还是与语音翻译一起使用，都需要了解一些常见概念和配置选项。

定义音频中预期的候选语言列表。
确定使用起始还是连续语言标识。

然后，向语音服务提出识别一次或连续识别请求。

本文提供代码片段来描述这些概念。提供了指向每个用例的完整示例的链接。

候选语言

向候选语言提供 AutoDetectSourceLanguageConfig 对象。你预计音频中至少有一个候选人。对于起始 LID，最多可添加 4 种语言；对于连续 LID，最多可添加 10 种语言。语音服务会返回提供的其中一种候选语言，即使音频中不存在这些语言。例如，如果提供 fr-FR（法语）和 en-US（英语）作为候选语言，但语音使用的是德语，则服务会返回 fr-FR 或 en-US。

您必须提供包括破折号 (-) 分隔符的完整语言区域，但对于每个基本语言，语言标识仅使用其中一个语言区域。对于同一语言，请勿包含多个区域设置（例如“en-US”和“en-GB”）。

var autoDetectSourceLanguageConfig =
    AutoDetectSourceLanguageConfig.FromLanguages(new string[] { "en-US", "de-DE", "zh-cn" });

auto autoDetectSourceLanguageConfig = 
    AutoDetectSourceLanguageConfig::FromLanguages({ "en-US", "de-DE", "zh-cn" });

auto_detect_source_language_config = \
    speechsdk.languageconfig.AutoDetectSourceLanguageConfig(languages=["en-US", "de-DE", "zh-cn"])

AutoDetectSourceLanguageConfig autoDetectSourceLanguageConfig =
    AutoDetectSourceLanguageConfig.fromLanguages(Arrays.asList("en-US", "de-DE", "zh-cn"));

var autoDetectSourceLanguageConfig = SpeechSDK.AutoDetectSourceLanguageConfig.fromLanguages([("en-US", "de-DE", "zh-cn"]);

NSArray *languages = @[@"en-US", @"de-DE", @"zh-cn"];
SPXAutoDetectSourceLanguageConfiguration* autoDetectSourceLanguageConfig = \
    [[SPXAutoDetectSourceLanguageConfiguration alloc]init:languages];

有关详细信息，请参阅支持的语言。

起始和连续语言识别

语音支持启动时和连续性语言识别 (LID)。

注意

只有 C#、C++、Java（仅限语音转文本）、JavaScript（仅限文本转语音）和 Python 中的语音 SDK 支持连续语言标识。

起始 LID 在音频开始后的几秒钟内立即标识语言。如果音频中的语言不会更改，请使用“开始时的语言识别（LID）”。使用起始 LID 时，可以在不到 5 秒的时间内检测并返回一种语言。
连续 LID 可以在音频期间识别多种语言。如果音频中的语言可能会变化，请使用“连续 LID”。 “连续语言识别（LID）”不支持在同一句子中更改语言。例如，如果你主要说西班牙语并插入一些英语单词，它不会检测每个单词的语言更改。

通过调用一次识别或连续识别的方法，可以实现起始 LID 或连续 LID。仅在连续识别的情况下才支持连续 LID。

单次识别或连续识别

语言标识通过识别对象和操作完成。向语音服务请求识别音频。

注意

请勿将识别与标识混淆。识别可以与语言标识一起使用，也可单独使用。

要么调用“识别一次”方法，或者启动和停止连续识别方法。你可从以下选项中进行选择：

使用“起始 LID”进行一次识别。 “识别一次”不支持连续 LID。
使用启动时 LID 和连续识别。
使用连续语言识别(LID)进行连续识别。

仅在连续 LID 情况下才需要 SpeechServiceConnection_LanguageIdMode 属性。如果不进行设置，语音服务将默认使用“启动时 LID”设置。支持的值为启动时 LID AtStart 或连续 LID Continuous。

// Recognize once with At-start LID. Continuous LID isn't supported for recognize once.
var result = await recognizer.RecognizeOnceAsync();

// Start and stop continuous recognition with At-start LID
await recognizer.StartContinuousRecognitionAsync();
await recognizer.StopContinuousRecognitionAsync();

// Start and stop continuous recognition with Continuous LID
speechConfig.SetProperty(PropertyId.SpeechServiceConnection_LanguageIdMode, "Continuous");
await recognizer.StartContinuousRecognitionAsync();
await recognizer.StopContinuousRecognitionAsync();

// Recognize once with At-start LID. Continuous LID isn't supported for recognize once.
auto result = recognizer->RecognizeOnceAsync().get();

// Start and stop continuous recognition with At-start LID
recognizer->StartContinuousRecognitionAsync().get();
recognizer->StopContinuousRecognitionAsync().get();

// Start and stop continuous recognition with Continuous LID
speechConfig->SetProperty(PropertyId::SpeechServiceConnection_LanguageIdMode, "Continuous");
recognizer->StartContinuousRecognitionAsync().get();
recognizer->StopContinuousRecognitionAsync().get();

// Recognize once with At-start LID. Continuous LID isn't supported for recognize once.
SpeechRecognitionResult  result = recognizer->RecognizeOnceAsync().get();

// Start and stop continuous recognition with At-start LID
recognizer.startContinuousRecognitionAsync().get();
recognizer.stopContinuousRecognitionAsync().get();

// Start and stop continuous recognition with Continuous LID
speechConfig.setProperty(PropertyId.SpeechServiceConnection_LanguageIdMode, "Continuous");
recognizer.startContinuousRecognitionAsync().get();
recognizer.stopContinuousRecognitionAsync().get();

# Recognize once with At-start LID. Continuous LID isn't supported for recognize once.
result = recognizer.recognize_once()

# Start and stop continuous recognition with At-start LID
recognizer.start_continuous_recognition()
recognizer.stop_continuous_recognition()

# Start and stop continuous recognition with Continuous LID
speech_config.set_property(property_id=speechsdk.PropertyId.SpeechServiceConnection_LanguageIdMode, value='Continuous')
recognizer.start_continuous_recognition()
recognizer.stop_continuous_recognition()

使用“语音转文本”功能

当需要识别音频源中的语言并将其转录为文本时，使用语音转文本识别。有关详细信息，请参阅语音转文本概述。

注意

C#、C++、Python、Java、JavaScript 和 Objective-C 中的语音 SDK 支持使用起始语言标识的语音转文本识别。只有 C#、C++、Java、JavaScript 和 Python 中的语音 SDK 支持使用连续语言标识的语音转文本识别。

目前，对于使用连续语言识别的语音转文本识别，必须从终结点创建 SpeechConfig，如代码示例所示。

有关使用语言标识的语音转文本识别的更多示例，请参阅 GitHub。

识别一次
连续识别

using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

var speechConfig = SpeechConfig.FromEndpoint(new Uri("YourSpeechEndpoint"), "YourSpeechKey");

var autoDetectSourceLanguageConfig =
    AutoDetectSourceLanguageConfig.FromLanguages(
        new string[] { "en-US", "de-DE", "zh-cn" });

using var audioConfig = AudioConfig.FromDefaultMicrophoneInput();
using (var recognizer = new SpeechRecognizer(
    speechConfig,
    autoDetectSourceLanguageConfig,
    audioConfig))
{
    var speechRecognitionResult = await recognizer.RecognizeOnceAsync();
    var autoDetectSourceLanguageResult =
        AutoDetectSourceLanguageResult.FromResult(speechRecognitionResult);
    var detectedLanguage = autoDetectSourceLanguageResult.Language;
}

using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

var config = SpeechConfig.FromEndpoint(new Uri("YourSpeechEndpoint"), "YourSpeechKey");

// Set Azure LanguageIdMode (Optional; Either Continuous or AtStart are accepted; Default AtStart)
config.SetProperty(PropertyId.SpeechServiceConnection_LanguageIdMode, "Continuous");

var autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig.FromLanguages(new string[] { "en-US", "de-DE", "zh-cn" });

var stopRecognition = new TaskCompletionSource<int>();
using (var audioInput = AudioConfig.FromWavFileInput(@"en-us_zh-cn.wav"))
{
    using (var recognizer = new SpeechRecognizer(config, autoDetectSourceLanguageConfig, audioInput))
    {
        // Subscribes to events.
        recognizer.Recognizing += (s, e) =>
        {
            if (e.Result.Reason == ResultReason.RecognizingSpeech)
            {
                Console.WriteLine($"RECOGNIZING: Text={e.Result.Text}");
                var autoDetectSourceLanguageResult = AutoDetectSourceLanguageResult.FromResult(e.Result);
                Console.WriteLine($"DETECTED: Language={autoDetectSourceLanguageResult.Language}");
            }
        };

        recognizer.Recognized += (s, e) =>
        {
            if (e.Result.Reason == ResultReason.RecognizedSpeech)
            {
                Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
                var autoDetectSourceLanguageResult = AutoDetectSourceLanguageResult.FromResult(e.Result);
                Console.WriteLine($"DETECTED: Language={autoDetectSourceLanguageResult.Language}");
            }
            else if (e.Result.Reason == ResultReason.NoMatch)
            {
                Console.WriteLine($"NOMATCH: Speech could not be recognized.");
            }
        };

        recognizer.Canceled += (s, e) =>
        {
            Console.WriteLine($"CANCELED: Reason={e.Reason}");

            if (e.Reason == CancellationReason.Error)
            {
                Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
                Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
                Console.WriteLine($"CANCELED: Did you set the speech resource key and endpoint values?");
            }

            stopRecognition.TrySetResult(0);
        };

        recognizer.SessionStarted += (s, e) =>
        {
            Console.WriteLine("\n    Session started event.");
        };

        recognizer.SessionStopped += (s, e) =>
        {
            Console.WriteLine("\n    Session stopped event.");
            Console.WriteLine("\nStop recognition.");
            stopRecognition.TrySetResult(0);
        };

        // Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
        await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);

        // Waits for completion.
        // Use Task.WaitAny to keep the task rooted.
        Task.WaitAny(new[] { stopRecognition.Task });

        // Stops recognition.
        await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
    }
}

有关使用语言标识的语音转文本识别的更多示例，请参阅 GitHub。

识别一次
连续识别

using namespace std;
using namespace Microsoft::CognitiveServices::Speech;
using namespace Microsoft::CognitiveServices::Speech::Audio;

auto speechConfig = SpeechConfig::FromEndpoint("YourServiceEndpoint", "YourSpeechResoureKey");

auto autoDetectSourceLanguageConfig =
    AutoDetectSourceLanguageConfig::FromLanguages({ "en-US", "de-DE", "zh-cn" });

auto recognizer = SpeechRecognizer::FromConfig(
    speechConfig,
    autoDetectSourceLanguageConfig
    );

speechRecognitionResult = recognizer->RecognizeOnceAsync().get();
auto autoDetectSourceLanguageResult =
    AutoDetectSourceLanguageResult::FromResult(speechRecognitionResult);
auto detectedLanguage = autoDetectSourceLanguageResult->Language;

    // Creates an instance of a speech config with specified subscription key and service region.
    // Note: For multi-lingual speech recognition with language id, it only works with speech v2 endpoint,
    // you must use FromEndpoint api in order to use the speech v2 endpoint.

    // Replace YourServiceRegion with your region, for example "chinanorth2", and
    // replace YourSubscriptionKey with your own speech key.
    string speechv2Endpoint = "wss://YourServiceRegion.stt.speech.azure.cn/speech/universal/v2";
    auto speechConfig = SpeechConfig::FromEndpoint(speechv2Endpoint, "YourSubscriptionKey");

    // Set the mode of input language detection to either "AtStart" (the default) or "Continuous".
    // Please refer to the documentation of Language ID for more information.
    // https://docs.azure.cn/cognitive-services/speech-service/language-identification?pivots=programming-language-cpp&tabs=once
    speechConfig->SetProperty(PropertyId::SpeechServiceConnection_LanguageIdMode, "Continuous");

    // Define the set of languages to detect
    auto autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig::FromLanguages({ "en-US", "zh-cn" });

    // Creates a speech recognizer using file as audio input.
    // Replace with your own audio file name.
    auto audioInput = AudioConfig::FromWavFileInput("en-us_zh-cn.wav");
    auto recognizer = SpeechRecognizer::FromConfig(speechConfig, autoDetectSourceLanguageConfig, audioInput);

    // promise for synchronization of recognition end.
    promise<void> recognitionEnd;

    // Subscribes to events.
    recognizer->Recognizing.Connect([](const SpeechRecognitionEventArgs& e)
        {
            auto lidResult = AutoDetectSourceLanguageResult::FromResult(e.Result);
            cout << "Recognizing in " << lidResult->Language << ": Text =" << e.Result->Text << std::endl;
        });

    recognizer->Recognized.Connect([](const SpeechRecognitionEventArgs& e)
        {
            if (e.Result->Reason == ResultReason::RecognizedSpeech)
            {
                auto lidResult = AutoDetectSourceLanguageResult::FromResult(e.Result);
                cout << "RECOGNIZED in " << lidResult->Language << ": Text=" << e.Result->Text << "\n"
                    << "  Offset=" << e.Result->Offset() << "\n"
                    << "  Duration=" << e.Result->Duration() << std::endl;
            }
            else if (e.Result->Reason == ResultReason::NoMatch)
            {
                cout << "NOMATCH: Speech could not be recognized." << std::endl;
            }
        });

    recognizer->Canceled.Connect([&recognitionEnd](const SpeechRecognitionCanceledEventArgs& e)
        {
            cout << "CANCELED: Reason=" << (int)e.Reason << std::endl;

            if (e.Reason == CancellationReason::Error)
            {
                cout << "CANCELED: ErrorCode=" << (int)e.ErrorCode << "\n"
                    << "CANCELED: ErrorDetails=" << e.ErrorDetails << "\n"
                    << "CANCELED: Did you update the subscription info?" << std::endl;

                recognitionEnd.set_value(); // Notify to stop recognition.
            }
        });

    recognizer->SessionStopped.Connect([&recognitionEnd](const SessionEventArgs& e)
        {
            cout << "Session stopped.";
            recognitionEnd.set_value(); // Notify to stop recognition.
        });

    // Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
    recognizer->StartContinuousRecognitionAsync().get();

    // Waits for recognition end.
    recognitionEnd.get_future().get();

    // Stops recognition.
    recognizer->StopContinuousRecognitionAsync().get();

有关使用语言标识的语音转文本识别的更多示例，请参阅 GitHub。

识别一次
连续识别

AutoDetectSourceLanguageConfig autoDetectSourceLanguageConfig =
    AutoDetectSourceLanguageConfig.fromLanguages(Arrays.asList("en-US", "de-DE"));

SpeechRecognizer recognizer = new SpeechRecognizer(
    speechConfig,
    autoDetectSourceLanguageConfig,
    audioConfig);

Future<SpeechRecognitionResult> future = recognizer.recognizeOnceAsync();
SpeechRecognitionResult result = future.get(30, TimeUnit.SECONDS);
AutoDetectSourceLanguageResult autoDetectSourceLanguageResult =
    AutoDetectSourceLanguageResult.fromResult(result);
String detectedLanguage = autoDetectSourceLanguageResult.getLanguage();

recognizer.close();
speechConfig.close();
autoDetectSourceLanguageConfig.close();
audioConfig.close();
result.close();

    // Shows how to do continuous speech recognition on a multilingual audio file with continuous language detection. Here, we assume the
    // spoken language in the file can alternate between English (US), Spanish (Mexico) and German.
    // If specified, speech recognition will use the custom model associated with the detected language.
    public static void continuousRecognitionFromFileWithContinuousLanguageDetectionWithCustomModels() throws InterruptedException, ExecutionException, IOException, URISyntaxException
    {
        // Creates an instance of a speech config with specified
        // subscription key and endpoint URL. Replace with your own subscription key
        // and endpoint URL.
        SpeechConfig speechConfig = SpeechConfig.fromEndpoint(new URI("YourEndpointUrl"), "YourSubscriptionKey");

        // Change the default from at-start language detection to continuous language detection, since the spoken language in the audio
        // may change.
        speechConfig.setProperty(PropertyId.SpeechServiceConnection_LanguageIdMode, "Continuous");

        // Define a set of expected spoken languages in the audio, with an optional custom model endpoint ID associated with each.
        // Update the below with your own languages. Please see https://learn.microsoft.com/azure/cognitive-services/speech-service/language-support
        // for all supported languages.
        // Update the below with your own custom model endpoint IDs, or omit it if you want to use the standard model.
        List<SourceLanguageConfig> sourceLanguageConfigs = new ArrayList<SourceLanguageConfig>();
        sourceLanguageConfigs.add(SourceLanguageConfig.fromLanguage("en-US", "YourEnUsCustomModelID"));
        sourceLanguageConfigs.add(SourceLanguageConfig.fromLanguage("es-MX", "YourEsMxCustomModelID"));
        sourceLanguageConfigs.add(SourceLanguageConfig.fromLanguage("de-DE"));

        // Creates an instance of AutoDetectSourceLanguageConfig with the above 3 source language configurations.
        AutoDetectSourceLanguageConfig autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig.fromSourceLanguageConfigs(sourceLanguageConfigs);

        // We provide a WAV file with English and Spanish utterances as an example. Replace with your own multilingual audio file name.
        AudioConfig audioConfig = AudioConfig.fromWavFileInput( "es-mx_en-us.wav");

        // Creates a speech recognizer using file as audio input and the AutoDetectSourceLanguageConfig
        SpeechRecognizer speechRecognizer = new SpeechRecognizer(speechConfig, autoDetectSourceLanguageConfig, audioConfig);

        // Semaphore used to signal the call to stop continuous recognition (following either a session ended or a cancelled event)
        final Semaphore doneSemaphone = new Semaphore(0);

        // Subscribes to events.

        /* Uncomment this to see intermediate recognition results. Since this is verbose and the WAV file is long, it is commented out by default in this sample.
        speechRecognizer.recognizing.addEventListener((s, e) -> {
            AutoDetectSourceLanguageResult autoDetectSourceLanguageResult = AutoDetectSourceLanguageResult.fromResult(e.getResult());
            String language = autoDetectSourceLanguageResult.getLanguage();
            System.out.println(" RECOGNIZING: Text = " + e.getResult().getText());
            System.out.println(" RECOGNIZING: Language = " + language);
        });
        */

        speechRecognizer.recognized.addEventListener((s, e) -> {
            AutoDetectSourceLanguageResult autoDetectSourceLanguageResult = AutoDetectSourceLanguageResult.fromResult(e.getResult());
            String language = autoDetectSourceLanguageResult.getLanguage();
            if (e.getResult().getReason() == ResultReason.RecognizedSpeech) {
                System.out.println(" RECOGNIZED: Text = " + e.getResult().getText());
                System.out.println(" RECOGNIZED: Language = " + language);
            }
            else if (e.getResult().getReason() == ResultReason.NoMatch) {
                if (language == null || language.isEmpty() || language.toLowerCase().equals("unknown")) {
                    System.out.println(" NOMATCH: Speech Language could not be detected.");
                }
                else {
                    System.out.println(" NOMATCH: Speech could not be recognized.");
                }
            }
        });

        speechRecognizer.canceled.addEventListener((s, e) -> {
            System.out.println(" CANCELED: Reason = " + e.getReason());
            if (e.getReason() == CancellationReason.Error) {
                System.out.println(" CANCELED: ErrorCode = " + e.getErrorCode());
                System.out.println(" CANCELED: ErrorDetails = " + e.getErrorDetails());
                System.out.println(" CANCELED: Did you update the subscription info?");
            }
            doneSemaphone.release();
        });

        speechRecognizer.sessionStarted.addEventListener((s, e) -> {
            System.out.println("\n Session started event.");
        });

        speechRecognizer.sessionStopped.addEventListener((s, e) -> {
            System.out.println("\n Session stopped event.");
            doneSemaphone.release();
        });

        // Starts continuous recognition and wait for processing to end
        System.out.println(" Recognizing from WAV file... please wait");
        speechRecognizer.startContinuousRecognitionAsync().get();
        doneSemaphone.tryAcquire(30, TimeUnit.SECONDS);

        // Stop continuous recognition
        speechRecognizer.stopContinuousRecognitionAsync().get();

        // These objects must be closed in order to dispose underlying native resources
        speechRecognizer.close();
        speechConfig.close();
        audioConfig.close();
        for (SourceLanguageConfig sourceLanguageConfig : sourceLanguageConfigs)
        {
            sourceLanguageConfig.close();
        }
        autoDetectSourceLanguageConfig.close();
}

有关使用语言标识的语音转文本识别的更多示例，请参阅 GitHub。

识别一次
连续识别

auto_detect_source_language_config = \
        speechsdk.languageconfig.AutoDetectSourceLanguageConfig(languages=["en-US", "de-DE"])
speech_recognizer = speechsdk.SpeechRecognizer(
        speech_config=speech_config, 
        auto_detect_source_language_config=auto_detect_source_language_config, 
        audio_config=audio_config)
result = speech_recognizer.recognize_once()
auto_detect_source_language_result = speechsdk.AutoDetectSourceLanguageResult(result)
detected_language = auto_detect_source_language_result.language

import azure.cognitiveservices.speech as speechsdk
import time
import json

speech_key, endpoint_string = "YourSpeechResoureKey","YourServiceEndpoint"
weatherfilename="en-us_zh-cn.wav"

speech_config = speechsdk.SpeechConfig(subscription=speech_key, endpoint=endpoint_string)
audio_config = speechsdk.audio.AudioConfig(filename=weatherfilename)

# Set Azure LanguageIdMode (Optional; Either Continuous or AtStart are accepted; Default AtStart)
speech_config.set_property(property_id=speechsdk.PropertyId.SpeechServiceConnection_LanguageIdMode, value='Continuous')

auto_detect_source_language_config = speechsdk.languageconfig.AutoDetectSourceLanguageConfig(
    languages=["en-US", "de-DE", "zh-cn"])

speech_recognizer = speechsdk.SpeechRecognizer(
    speech_config=speech_config, 
    auto_detect_source_language_config=auto_detect_source_language_config,
    audio_config=audio_config)

done = False

def stop_cb(evt):
    """callback that signals to stop continuous recognition upon receiving an event `evt`"""
    print('CLOSING on {}'.format(evt))
    nonlocal done
    done = True

# Connect callbacks to the events fired by the speech recognizer
speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
# stop continuous recognition on either session stopped or canceled events
speech_recognizer.session_stopped.connect(stop_cb)
speech_recognizer.canceled.connect(stop_cb)

# Start continuous speech recognition
speech_recognizer.start_continuous_recognition()
while not done:
    time.sleep(.5)

speech_recognizer.stop_continuous_recognition()

NSArray *languages = @[@"en-US", @"de-DE", @"zh-cn"];
SPXAutoDetectSourceLanguageConfiguration* autoDetectSourceLanguageConfig = \
        [[SPXAutoDetectSourceLanguageConfiguration alloc]init:languages];
SPXSpeechRecognizer* speechRecognizer = \
        [[SPXSpeechRecognizer alloc] initWithSpeechConfiguration:speechConfig
                           autoDetectSourceLanguageConfiguration:autoDetectSourceLanguageConfig
                                              audioConfiguration:audioConfig];
SPXSpeechRecognitionResult *result = [speechRecognizer recognizeOnce];
SPXAutoDetectSourceLanguageResult *languageDetectionResult = [[SPXAutoDetectSourceLanguageResult alloc] init:result];
NSString *detectedLanguage = [languageDetectionResult language];

var autoDetectSourceLanguageConfig = SpeechSDK.AutoDetectSourceLanguageConfig.fromLanguages(["en-US", "de-DE"]);
var speechRecognizer = SpeechSDK.SpeechRecognizer.FromConfig(speechConfig, autoDetectSourceLanguageConfig, audioConfig);
speechRecognizer.recognizeOnceAsync((result: SpeechSDK.SpeechRecognitionResult) => {
        var languageDetectionResult = SpeechSDK.AutoDetectSourceLanguageResult.fromResult(result);
        var detectedLanguage = languageDetectionResult.language;
},
{});

语音转文本自定义模型

注意

自定义模型的语言检测只可用于实时语音转文本和语音翻译。批量听录仅支持默认基础模型的语言检测。

此示例演示如何将语言检测与自定义终结点一起使用。如果检测到的语言为 en-US，则示例使用默认模型。如果检测到的语言为 fr-FR，则示例使用自定义模型终结点。有关详细信息，请参阅部署自定义语音识别模型。

var sourceLanguageConfigs = new SourceLanguageConfig[]
{
    SourceLanguageConfig.FromLanguage("en-US"),
    SourceLanguageConfig.FromLanguage("fr-FR", "The Endpoint Id for custom model of fr-FR")
};
var autoDetectSourceLanguageConfig =
    AutoDetectSourceLanguageConfig.FromSourceLanguageConfigs(
        sourceLanguageConfigs);

std::vector<std::shared_ptr<SourceLanguageConfig>> sourceLanguageConfigs;
sourceLanguageConfigs.push_back(
    SourceLanguageConfig::FromLanguage("en-US"));
sourceLanguageConfigs.push_back(
    SourceLanguageConfig::FromLanguage("fr-FR", "The Endpoint Id for custom model of fr-FR"));

auto autoDetectSourceLanguageConfig =
    AutoDetectSourceLanguageConfig::FromSourceLanguageConfigs(
        sourceLanguageConfigs);

List sourceLanguageConfigs = new ArrayList<SourceLanguageConfig>();
sourceLanguageConfigs.add(
    SourceLanguageConfig.fromLanguage("en-US"));
sourceLanguageConfigs.add(
    SourceLanguageConfig.fromLanguage("fr-FR", "The Endpoint Id for custom model of fr-FR"));

AutoDetectSourceLanguageConfig autoDetectSourceLanguageConfig =
    AutoDetectSourceLanguageConfig.fromSourceLanguageConfigs(
        sourceLanguageConfigs);

 en_language_config = speechsdk.languageconfig.SourceLanguageConfig("en-US")
 fr_language_config = speechsdk.languageconfig.SourceLanguageConfig("fr-FR", "The Endpoint Id for custom model of fr-FR")
 auto_detect_source_language_config = speechsdk.languageconfig.AutoDetectSourceLanguageConfig(
        sourceLanguageConfigs=[en_language_config, fr_language_config])

SPXSourceLanguageConfiguration* enLanguageConfig = [[SPXSourceLanguageConfiguration alloc]init:@"en-US"];
SPXSourceLanguageConfiguration* frLanguageConfig = \
        [[SPXSourceLanguageConfiguration alloc]initWithLanguage:@"fr-FR"
                                                     endpointId:@"The Endpoint Id for custom model of fr-FR"];
NSArray *languageConfigs = @[enLanguageConfig, frLanguageConfig];
SPXAutoDetectSourceLanguageConfiguration* autoDetectSourceLanguageConfig = \
        [[SPXAutoDetectSourceLanguageConfiguration alloc]initWithSourceLanguageConfigurations:languageConfigs];

var enLanguageConfig = SpeechSDK.SourceLanguageConfig.fromLanguage("en-US");
var frLanguageConfig = SpeechSDK.SourceLanguageConfig.fromLanguage("fr-FR", "The Endpoint Id for custom model of fr-FR");
var autoDetectSourceLanguageConfig = SpeechSDK.AutoDetectSourceLanguageConfig.fromSourceLanguageConfigs([enLanguageConfig, frLanguageConfig]);

运行语音翻译

当需要识别音频源中的语言并将其翻译为另一种语言时，请使用语音翻译。有关详细信息，请参阅语音翻译概述。

注意

仅 C#、C++、Java、JavaScript 和 Python 中的语音 SDK 支持使用语言识别的语音翻译。

有关使用语言标识的语音翻译的更多示例，请参阅 GitHub。

识别一次
连续识别

using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using Microsoft.CognitiveServices.Speech.Translation;

public static async Task RecognizeOnceSpeechTranslationAsync()
{
    var endpointUrl = new Uri("YourSpeechResoureEndpoint");

    var config = SpeechTranslationConfig.FromEndpoint(endpointUrl, "YourSpeechResoureKey");

    // Source language is required, but currently ignored. 
    string fromLanguage = "en-US";
    speechTranslationConfig.SpeechRecognitionLanguage = fromLanguage;

    speechTranslationConfig.AddTargetLanguage("de");
    speechTranslationConfig.AddTargetLanguage("fr");

    var autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig.FromLanguages(new string[] { "en-US", "de-DE", "zh-cn" });

    using var audioConfig = AudioConfig.FromDefaultMicrophoneInput();

    using (var recognizer = new TranslationRecognizer(
        speechTranslationConfig, 
        autoDetectSourceLanguageConfig,
        audioConfig))
    {

        Console.WriteLine("Say something or read from file...");
        var result = await recognizer.RecognizeOnceAsync().ConfigureAwait(false);

        if (result.Reason == ResultReason.TranslatedSpeech)
        {
            var lidResult = result.Properties.GetProperty(PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult);

            Console.WriteLine($"RECOGNIZED in '{lidResult}': Text={result.Text}");
            foreach (var element in result.Translations)
            {
                Console.WriteLine($"    TRANSLATED into '{element.Key}': {element.Value}");
            }
        }
    }
}

using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using Microsoft.CognitiveServices.Speech.Translation;

public static async Task MultiLingualTranslation()
{
    var endpointUrl = new Uri("YourSpeechResoureEndpoint");

    var config = SpeechTranslationConfig.FromEndpoint(endpointUrl, "YourSpeechResoureKey");

    // Source language is required, but currently ignored. 
    string fromLanguage = "en-US";
    config.SpeechRecognitionLanguage = fromLanguage;

    config.AddTargetLanguage("de");
    config.AddTargetLanguage("fr");

    // Set Azure LanguageIdMode (Optional; Either Continuous or AtStart are accepted; Default AtStart)
    config.SetProperty(PropertyId.SpeechServiceConnection_LanguageIdMode, "Continuous");
    var autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig.FromLanguages(new string[] { "en-US", "de-DE", "zh-cn" });

    var stopTranslation = new TaskCompletionSource<int>();
    using (var audioInput = AudioConfig.FromWavFileInput(@"en-us_zh-cn.wav"))
    {
        using (var recognizer = new TranslationRecognizer(config, autoDetectSourceLanguageConfig, audioInput))
        {
            recognizer.Recognizing += (s, e) =>
            {
                var lidResult = e.Result.Properties.GetProperty(PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult);

                Console.WriteLine($"RECOGNIZING in '{lidResult}': Text={e.Result.Text}");
                foreach (var element in e.Result.Translations)
                {
                    Console.WriteLine($"    TRANSLATING into '{element.Key}': {element.Value}");
                }
            };

            recognizer.Recognized += (s, e) => {
                if (e.Result.Reason == ResultReason.TranslatedSpeech)
                {
                    var lidResult = e.Result.Properties.GetProperty(PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult);

                    Console.WriteLine($"RECOGNIZED in '{lidResult}': Text={e.Result.Text}");
                    foreach (var element in e.Result.Translations)
                    {
                        Console.WriteLine($"    TRANSLATED into '{element.Key}': {element.Value}");
                    }
                }
                else if (e.Result.Reason == ResultReason.RecognizedSpeech)
                {
                    Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
                    Console.WriteLine($"    Speech not translated.");
                }
                else if (e.Result.Reason == ResultReason.NoMatch)
                {
                    Console.WriteLine($"NOMATCH: Speech could not be recognized.");
                }
            };

            recognizer.Canceled += (s, e) =>
            {
                Console.WriteLine($"CANCELED: Reason={e.Reason}");

                if (e.Reason == CancellationReason.Error)
                {
                    Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
                    Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
                    Console.WriteLine($"CANCELED: Did you set the speech resource key and endpoint values?");
                }

                stopTranslation.TrySetResult(0);
            };

            recognizer.SpeechStartDetected += (s, e) => {
                Console.WriteLine("\nSpeech start detected event.");
            };

            recognizer.SpeechEndDetected += (s, e) => {
                Console.WriteLine("\nSpeech end detected event.");
            };

            recognizer.SessionStarted += (s, e) => {
                Console.WriteLine("\nSession started event.");
            };

            recognizer.SessionStopped += (s, e) => {
                Console.WriteLine("\nSession stopped event.");
                Console.WriteLine($"\nStop translation.");
                stopTranslation.TrySetResult(0);
            };

            // Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
            Console.WriteLine("Start translation...");
            await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);

            Task.WaitAny(new[] { stopTranslation.Task });
            await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
        }
    }
}

有关使用语言标识的语音翻译的更多示例，请参阅 GitHub。

识别一次
连续识别

auto endpointString = "YourSpeechResoureEndpoint";
auto config = SpeechTranslationConfig::FromEndpoint(endpointString, "YourSpeechResoureKey");

auto autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig::FromLanguages({ "en-US", "de-DE" });

// Sets source and target languages
// The source language will be detected by the language detection feature. 
// However, the SpeechRecognitionLanguage still need to set with a locale string, but it will not be used as the source language.
// This will be fixed in a future version of Speech SDK.
auto fromLanguage = "en-US";
config->SetSpeechRecognitionLanguage(fromLanguage);
config->AddTargetLanguage("de");
config->AddTargetLanguage("fr");

// Creates a translation recognizer using microphone as audio input.
auto recognizer = TranslationRecognizer::FromConfig(config, autoDetectSourceLanguageConfig);
cout << "Say something...\n";

// Starts translation, and returns after a single utterance is recognized. The end of a
// single utterance is determined by listening for silence at the end or until a maximum of 15
// seconds of audio is processed. The task returns the recognized text as well as the translation.
// Note: Since RecognizeOnceAsync() returns only a single utterance, it is suitable only for single
// shot recognition like command or query.
// For long-running multi-utterance recognition, use StartContinuousRecognitionAsync() instead.
auto result = recognizer->RecognizeOnceAsync().get();

// Checks result.
if (result->Reason == ResultReason::TranslatedSpeech)
{
    cout << "RECOGNIZED: Text=" << result->Text << std::endl;

    for (const auto& it : result->Translations)
    {
        cout << "TRANSLATED into '" << it.first.c_str() << "': " << it.second.c_str() << std::endl;
    }
}
else if (result->Reason == ResultReason::RecognizedSpeech)
{
    cout << "RECOGNIZED: Text=" << result->Text << " (text could not be translated)" << std::endl;
}
else if (result->Reason == ResultReason::NoMatch)
{
    cout << "NOMATCH: Speech could not be recognized." << std::endl;
}
else if (result->Reason == ResultReason::Canceled)
{
    auto cancellation = CancellationDetails::FromResult(result);
    cout << "CANCELED: Reason=" << (int)cancellation->Reason << std::endl;

    if (cancellation->Reason == CancellationReason::Error)
    {
        cout << "CANCELED: ErrorCode=" << (int)cancellation->ErrorCode << std::endl;
        cout << "CANCELED: ErrorDetails=" << cancellation->ErrorDetails << std::endl;
        cout << "CANCELED: Did you set the speech resource key and endpoint values?" << std::endl;
    }
}

using namespace std;
using namespace Microsoft::CognitiveServices::Speech;
using namespace Microsoft::CognitiveServices::Speech::Audio;
using namespace Microsoft::CognitiveServices::Speech::Translation;

void MultiLingualTranslation()
{
    auto config = SpeechTranslationConfig::FromEndpoint("YourSpeechResoureEndpoint", "YourSpeechResoureKey");

    // Set Azure LanguageIdMode (Optional; Either Continuous or AtStart are accepted; Default AtStart)
    speechConfig->SetProperty(PropertyId::SpeechServiceConnection_LanguageIdMode, "Continuous");
    auto autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig::FromLanguages({ "en-US", "de-DE", "zh-cn" });

    promise<void> recognitionEnd;
    // Source language is required, but currently ignored. 
    auto fromLanguage = "en-US";
    config->SetSpeechRecognitionLanguage(fromLanguage);
    config->AddTargetLanguage("de");
    config->AddTargetLanguage("fr");

    auto audioInput = AudioConfig::FromWavFileInput("whatstheweatherlike.wav");
    auto recognizer = TranslationRecognizer::FromConfig(config, autoDetectSourceLanguageConfig, audioInput);

    recognizer->Recognizing.Connect([](const TranslationRecognitionEventArgs& e)
        {
            std::string lidResult = e.Result->Properties.GetProperty(PropertyId::SpeechServiceConnection_AutoDetectSourceLanguageResult);

            cout << "Recognizing in Language = "<< lidResult << ":" << e.Result->Text << std::endl;
            for (const auto& it : e.Result->Translations)
            {
                cout << "  Translated into '" << it.first.c_str() << "': " << it.second.c_str() << std::endl;
            }
        });

    recognizer->Recognized.Connect([](const TranslationRecognitionEventArgs& e)
        {
            if (e.Result->Reason == ResultReason::TranslatedSpeech)
            {
                std::string lidResult = e.Result->Properties.GetProperty(PropertyId::SpeechServiceConnection_AutoDetectSourceLanguageResult);
                cout << "RECOGNIZED in Language = " << lidResult << ": Text=" << e.Result->Text << std::endl;
            }
            else if (e.Result->Reason == ResultReason::RecognizedSpeech)
            {
                cout << "RECOGNIZED: Text=" << e.Result->Text << " (text could not be translated)" << std::endl;
            }
            else if (e.Result->Reason == ResultReason::NoMatch)
            {
                cout << "NOMATCH: Speech could not be recognized." << std::endl;
            }

            for (const auto& it : e.Result->Translations)
            {
                cout << "  Translated into '" << it.first.c_str() << "': " << it.second.c_str() << std::endl;
            }
        });

    recognizer->Canceled.Connect([&recognitionEnd](const TranslationRecognitionCanceledEventArgs& e)
        {
            cout << "CANCELED: Reason=" << (int)e.Reason << std::endl;
            if (e.Reason == CancellationReason::Error)
            {
                cout << "CANCELED: ErrorCode=" << (int)e.ErrorCode << std::endl;
                cout << "CANCELED: ErrorDetails=" << e.ErrorDetails << std::endl;
                cout << "CANCELED: Did you set the speech resource key and endpoint values?" << std::endl;

                recognitionEnd.set_value();
            }
        });

    recognizer->Synthesizing.Connect([](const TranslationSynthesisEventArgs& e)
        {
            auto size = e.Result->Audio.size();
            cout << "Translation synthesis result: size of audio data: " << size
                << (size == 0 ? "(END)" : "");
        });

    recognizer->SessionStopped.Connect([&recognitionEnd](const SessionEventArgs& e)
        {
            cout << "Session stopped.";
            recognitionEnd.set_value();
        });

    // Starts continuos recognition. Use StopContinuousRecognitionAsync() to stop recognition.
    recognizer->StartContinuousRecognitionAsync().get();
    recognitionEnd.get_future().get();
    recognizer->StopContinuousRecognitionAsync().get();
}

有关使用语言标识的语音翻译的更多示例，请参阅 GitHub。

识别一次
连续识别

import azure.cognitiveservices.speech as speechsdk
import time
import json

speech_key, service_endpoint = "YourSpeechResoureKey","YourServiceEndpoint"
weatherfilename="en-us_zh-cn.wav"

# set up translation parameters: source language and target languages
translation_config = speechsdk.translation.SpeechTranslationConfig(
    subscription=speech_key,
    endpoint=service_endpoint,
    speech_recognition_language='en-US',
    target_languages=('de', 'fr'))
audio_config = speechsdk.audio.AudioConfig(filename=weatherfilename)

# Specify the AutoDetectSourceLanguageConfig, which defines the number of possible languages
auto_detect_source_language_config = speechsdk.languageconfig.AutoDetectSourceLanguageConfig(languages=["en-US", "de-DE", "zh-cn"])

# Creates a translation recognizer using and audio file as input.
recognizer = speechsdk.translation.TranslationRecognizer(
    translation_config=translation_config, 
    audio_config=audio_config,
    auto_detect_source_language_config=auto_detect_source_language_config)

# Starts translation, and returns after a single utterance is recognized. The end of a
# single utterance is determined by listening for silence at the end or until a maximum of 15
# seconds of audio is processed. The task returns the recognition text as result.
# Note: Since recognize_once() returns only a single utterance, it is suitable only for single
# shot recognition like command or query.
# For long-running multi-utterance recognition, use start_continuous_recognition() instead.
result = recognizer.recognize_once()

# Check the result
if result.reason == speechsdk.ResultReason.TranslatedSpeech:
    print("""Recognized: {}
    German translation: {}
    French translation: {}""".format(
        result.text, result.translations['de'], result.translations['fr']))
elif result.reason == speechsdk.ResultReason.RecognizedSpeech:
    print("Recognized: {}".format(result.text))
    detectedSrcLang = result.properties[speechsdk.PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult]
    print("Detected Language: {}".format(detectedSrcLang))
elif result.reason == speechsdk.ResultReason.NoMatch:
    print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
    print("Translation canceled: {}".format(result.cancellation_details.reason))
    if result.cancellation_details.reason == speechsdk.CancellationReason.Error:
        print("Error details: {}".format(result.cancellation_details.error_details))

import azure.cognitiveservices.speech as speechsdk
import time
import json

speech_key, service_endpoint = "YourSpeechResoureKey","YourServiceEndpoint"
weatherfilename="en-us_zh-cn.wav"

# Currently the v2 endpoint is required. In a future SDK release you won't need to set it. 
translation_config = speechsdk.translation.SpeechTranslationConfig(
    subscription=speech_key,
    endpoint=service_endpoint,
    speech_recognition_language='en-US',
    target_languages=('de', 'fr'))
audio_config = speechsdk.audio.AudioConfig(filename=weatherfilename)

# Set Azure LanguageIdMode (Optional; Either Continuous or AtStart are accepted; Default AtStart)
translation_config.set_property(property_id=speechsdk.PropertyId.SpeechServiceConnection_LanguageIdMode, value='Continuous')

# Specify the AutoDetectSourceLanguageConfig, which defines the number of possible languages
auto_detect_source_language_config = speechsdk.languageconfig.AutoDetectSourceLanguageConfig(languages=["en-US", "de-DE", "zh-cn"])

# Creates a translation recognizer using and audio file as input.
recognizer = speechsdk.translation.TranslationRecognizer(
    translation_config=translation_config, 
    audio_config=audio_config,
    auto_detect_source_language_config=auto_detect_source_language_config)

def result_callback(event_type, evt):
    """callback to display a translation result"""
    print("{}: {}\n\tTranslations: {}\n\tResult Json: {}".format(
        event_type, evt, evt.result.translations.items(), evt.result.json))

done = False

def stop_cb(evt):
    """callback that signals to stop continuous recognition upon receiving an event `evt`"""
    print('CLOSING on {}'.format(evt))
    nonlocal done
    done = True

# connect callback functions to the events fired by the recognizer
recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
# event for intermediate results
recognizer.recognizing.connect(lambda evt: result_callback('RECOGNIZING', evt))
# event for final result
recognizer.recognized.connect(lambda evt: result_callback('RECOGNIZED', evt))
# cancellation event
recognizer.canceled.connect(lambda evt: print('CANCELED: {} ({})'.format(evt, evt.reason)))

# stop continuous recognition on either session stopped or canceled events
recognizer.session_stopped.connect(stop_cb)
recognizer.canceled.connect(stop_cb)

def synthesis_callback(evt):
    """
    callback for the synthesis event
    """
    print('SYNTHESIZING {}\n\treceived {} bytes of audio. Reason: {}'.format(
        evt, len(evt.result.audio), evt.result.reason))
    if evt.result.reason == speechsdk.ResultReason.RecognizedSpeech:
        print("RECOGNIZED: {}".format(evt.result.properties))
        if evt.result.properties.get(speechsdk.PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult) == None:
            print("Unable to detect any language")
        else:
            detectedSrcLang = evt.result.properties[speechsdk.PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult]
            jsonResult = evt.result.properties[speechsdk.PropertyId.SpeechServiceResponse_JsonResult]
            detailResult = json.loads(jsonResult)
            startOffset = detailResult['Offset']
            duration = detailResult['Duration']
            if duration >= 0:
                endOffset = duration + startOffset
            else:
                endOffset = 0
            print("Detected language = " + detectedSrcLang + ", startOffset = " + str(startOffset) + " nanoseconds, endOffset = " + str(endOffset) + " nanoseconds, Duration = " + str(duration) + " nanoseconds.")
            global language_detected
            language_detected = True

# connect callback to the synthesis event
recognizer.synthesizing.connect(synthesis_callback)

# start translation
recognizer.start_continuous_recognition()

while not done:
    time.sleep(.5)

recognizer.stop_continuous_recognition()

Last updated on 2026-06-09

实现语言识别

设置配置选项

候选语言

起始和连续语言识别

单次识别或连续识别

使用“语音转文本”功能

语音转文本自定义模型

运行语音翻译

相关内容

Recursos adicionales