Implement language identification
Language identification is used to identify languages spoken in audio when compared against a list of supported languages.
Language identification (LID) use cases include:
- Speech to text recognition when you need to identify the language in an audio source and then transcribe it to text.
- Speech translation when you need to identify the language in an audio source and then translate it to another language.
For speech recognition, the initial latency is higher with language identification. You should only include this optional feature as needed.
Set configuration options
Whether you use language identification with speech to text or with speech translation, there are some common concepts and configuration options.
- Define a list of candidate languages that you expect in the audio.
- Decide whether to use at-start or continuous language identification.
Then you make a recognize once or continuous recognition request to the Speech service.
Important
Language Identification APIs are simplified with the Speech SDK version 1.25 and later. The SpeechServiceConnection_SingleLanguageIdPriority
and SpeechServiceConnection_ContinuousLanguageIdPriority
properties have been removed. A single property SpeechServiceConnection_LanguageIdMode
replaces them. You no longer need to prioritize between low latency and high accuracy. For continuous speech recognition or translation, you only need to select whether to run at-start or continuous Language Identification.
This article provides code snippets to describe the concepts. Links to complete samples for each use case are provided.
Candidate languages
You provide candidate languages with the AutoDetectSourceLanguageConfig
object. You expect that at least one of the candidates is in the audio. You can include up to four languages for at-start LID or up to 10 languages for continuous LID. The Speech service returns one of the candidate languages provided even if those languages weren't in the audio. For example, if fr-FR
(French) and en-US
(English) are provided as candidates, but German is spoken, the service returns either fr-FR
or en-US
.
You must provide the full locale with dash (-
) separator, but language identification only uses one locale per base language. Don't include multiple locales for the same language, for example, en-US
and en-GB
.
var autoDetectSourceLanguageConfig =
AutoDetectSourceLanguageConfig.FromLanguages(new string[] { "en-US", "de-DE", "zh-cn" });
auto autoDetectSourceLanguageConfig =
AutoDetectSourceLanguageConfig::FromLanguages({ "en-US", "de-DE", "zh-cn" });
auto_detect_source_language_config = \
speechsdk.languageconfig.AutoDetectSourceLanguageConfig(languages=["en-US", "de-DE", "zh-cn"])
AutoDetectSourceLanguageConfig autoDetectSourceLanguageConfig =
AutoDetectSourceLanguageConfig.fromLanguages(Arrays.asList("en-US", "de-DE", "zh-cn"));
var autoDetectSourceLanguageConfig = SpeechSDK.AutoDetectSourceLanguageConfig.fromLanguages([("en-US", "de-DE", "zh-cn"]);
NSArray *languages = @[@"en-US", @"de-DE", @"zh-cn"];
SPXAutoDetectSourceLanguageConfiguration* autoDetectSourceLanguageConfig = \
[[SPXAutoDetectSourceLanguageConfiguration alloc]init:languages];
For more information, see supported languages.
At-start and Continuous language identification
Speech supports both at-start and continuous language identification (LID).
Note
Continuous language identification is only supported with Speech SDKs in C#, C++, Java (for speech to text only), JavaScript (for speech to text only), and Python.
- At-start LID identifies the language once within the first few seconds of audio. Use at-start LID if the language in the audio doesn't change. With at-start LID, a single language is detected and returned in less than 5 seconds.
- Continuous LID can identify multiple languages during the audio. Use continuous LID if the language in the audio could change. Continuous LID doesn't support changing languages within the same sentence. For example, if you're primarily speaking Spanish and insert some English words, it doesn't detect the language change per word.
You implement at-start LID or continuous LID by calling methods for recognize once or continuous. Continuous LID is only supported with continuous recognition.
Recognize once or continuous
Language identification is completed with recognition objects and operations. Make a request to the Speech service for recognition of audio.
Note
Don't confuse recognition with identification. Recognition can be used with or without language identification.
Either call the "recognize once" method, or the start and stop continuous recognition methods. You choose from:
- Recognize once with At-start LID. Continuous LID isn't supported for recognize once.
- Use continuous recognition with at-start LID.
- Use continuous recognition with continuous LID.
The SpeechServiceConnection_LanguageIdMode
property is only required for continuous LID. Without it, the Speech service defaults to at-start LID. The supported values are AtStart
for at-start LID or Continuous
for continuous LID.
// Recognize once with At-start LID. Continuous LID isn't supported for recognize once.
var result = await recognizer.RecognizeOnceAsync();
// Start and stop continuous recognition with At-start LID
await recognizer.StartContinuousRecognitionAsync();
await recognizer.StopContinuousRecognitionAsync();
// Start and stop continuous recognition with Continuous LID
speechConfig.SetProperty(PropertyId.SpeechServiceConnection_LanguageIdMode, "Continuous");
await recognizer.StartContinuousRecognitionAsync();
await recognizer.StopContinuousRecognitionAsync();
// Recognize once with At-start LID. Continuous LID isn't supported for recognize once.
auto result = recognizer->RecognizeOnceAsync().get();
// Start and stop continuous recognition with At-start LID
recognizer->StartContinuousRecognitionAsync().get();
recognizer->StopContinuousRecognitionAsync().get();
// Start and stop continuous recognition with Continuous LID
speechConfig->SetProperty(PropertyId::SpeechServiceConnection_LanguageIdMode, "Continuous");
recognizer->StartContinuousRecognitionAsync().get();
recognizer->StopContinuousRecognitionAsync().get();
// Recognize once with At-start LID. Continuous LID isn't supported for recognize once.
SpeechRecognitionResult result = recognizer->RecognizeOnceAsync().get();
// Start and stop continuous recognition with At-start LID
recognizer.startContinuousRecognitionAsync().get();
recognizer.stopContinuousRecognitionAsync().get();
// Start and stop continuous recognition with Continuous LID
speechConfig.setProperty(PropertyId.SpeechServiceConnection_LanguageIdMode, "Continuous");
recognizer.startContinuousRecognitionAsync().get();
recognizer.stopContinuousRecognitionAsync().get();
# Recognize once with At-start LID. Continuous LID isn't supported for recognize once.
result = recognizer.recognize_once()
# Start and stop continuous recognition with At-start LID
recognizer.start_continuous_recognition()
recognizer.stop_continuous_recognition()
# Start and stop continuous recognition with Continuous LID
speech_config.set_property(property_id=speechsdk.PropertyId.SpeechServiceConnection_LanguageIdMode, value='Continuous')
recognizer.start_continuous_recognition()
recognizer.stop_continuous_recognition()
Use speech to text
You use Speech to text recognition when you need to identify the language in an audio source and then transcribe it to text. For more information, see Speech to text overview.
Note
Speech to text recognition with at-start language identification is supported with Speech SDKs in C#, C++, Python, Java, JavaScript, and Objective-C. Speech to text recognition with continuous language identification is only supported with Speech SDKs in C#, C++, Java, JavaScript, and Python.
Currently for speech to text recognition with continuous language identification, you must create a SpeechConfig from the wss://{region}.stt.speech.azure.cn/speech/universal/v2
endpoint string, as shown in code examples. In a future SDK release you won't need to set it.
See more examples of speech to text recognition with language identification on GitHub.
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
var speechConfig = SpeechConfig.FromSubscription("YourSubscriptionKey","YourServiceRegion");
var autoDetectSourceLanguageConfig =
AutoDetectSourceLanguageConfig.FromLanguages(
new string[] { "en-US", "de-DE", "zh-cn" });
using var audioConfig = AudioConfig.FromDefaultMicrophoneInput();
using (var recognizer = new SpeechRecognizer(
speechConfig,
autoDetectSourceLanguageConfig,
audioConfig))
{
var speechRecognitionResult = await recognizer.RecognizeOnceAsync();
var autoDetectSourceLanguageResult =
AutoDetectSourceLanguageResult.FromResult(speechRecognitionResult);
var detectedLanguage = autoDetectSourceLanguageResult.Language;
}
See more examples of speech to text recognition with language identification on GitHub.
using namespace std;
using namespace Microsoft::CognitiveServices::Speech;
using namespace Microsoft::CognitiveServices::Speech::Audio;
auto speechConfig = SpeechConfig::FromSubscription("YourSubscriptionKey","YourServiceRegion");
auto autoDetectSourceLanguageConfig =
AutoDetectSourceLanguageConfig::FromLanguages({ "en-US", "de-DE", "zh-cn" });
auto recognizer = SpeechRecognizer::FromConfig(
speechConfig,
autoDetectSourceLanguageConfig
);
speechRecognitionResult = recognizer->RecognizeOnceAsync().get();
auto autoDetectSourceLanguageResult =
AutoDetectSourceLanguageResult::FromResult(speechRecognitionResult);
auto detectedLanguage = autoDetectSourceLanguageResult->Language;
See more examples of speech to text recognition with language identification on GitHub.
AutoDetectSourceLanguageConfig autoDetectSourceLanguageConfig =
AutoDetectSourceLanguageConfig.fromLanguages(Arrays.asList("en-US", "de-DE"));
SpeechRecognizer recognizer = new SpeechRecognizer(
speechConfig,
autoDetectSourceLanguageConfig,
audioConfig);
Future<SpeechRecognitionResult> future = recognizer.recognizeOnceAsync();
SpeechRecognitionResult result = future.get(30, TimeUnit.SECONDS);
AutoDetectSourceLanguageResult autoDetectSourceLanguageResult =
AutoDetectSourceLanguageResult.fromResult(result);
String detectedLanguage = autoDetectSourceLanguageResult.getLanguage();
recognizer.close();
speechConfig.close();
autoDetectSourceLanguageConfig.close();
audioConfig.close();
result.close();
See more examples of speech to text recognition with language identification on GitHub.
auto_detect_source_language_config = \
speechsdk.languageconfig.AutoDetectSourceLanguageConfig(languages=["en-US", "de-DE"])
speech_recognizer = speechsdk.SpeechRecognizer(
speech_config=speech_config,
auto_detect_source_language_config=auto_detect_source_language_config,
audio_config=audio_config)
result = speech_recognizer.recognize_once()
auto_detect_source_language_result = speechsdk.AutoDetectSourceLanguageResult(result)
detected_language = auto_detect_source_language_result.language
NSArray *languages = @[@"en-US", @"de-DE", @"zh-cn"];
SPXAutoDetectSourceLanguageConfiguration* autoDetectSourceLanguageConfig = \
[[SPXAutoDetectSourceLanguageConfiguration alloc]init:languages];
SPXSpeechRecognizer* speechRecognizer = \
[[SPXSpeechRecognizer alloc] initWithSpeechConfiguration:speechConfig
autoDetectSourceLanguageConfiguration:autoDetectSourceLanguageConfig
audioConfiguration:audioConfig];
SPXSpeechRecognitionResult *result = [speechRecognizer recognizeOnce];
SPXAutoDetectSourceLanguageResult *languageDetectionResult = [[SPXAutoDetectSourceLanguageResult alloc] init:result];
NSString *detectedLanguage = [languageDetectionResult language];
var autoDetectSourceLanguageConfig = SpeechSDK.AutoDetectSourceLanguageConfig.fromLanguages(["en-US", "de-DE"]);
var speechRecognizer = SpeechSDK.SpeechRecognizer.FromConfig(speechConfig, autoDetectSourceLanguageConfig, audioConfig);
speechRecognizer.recognizeOnceAsync((result: SpeechSDK.SpeechRecognitionResult) => {
var languageDetectionResult = SpeechSDK.AutoDetectSourceLanguageResult.fromResult(result);
var detectedLanguage = languageDetectionResult.language;
},
{});
Speech to text custom models
Note
Language detection with custom models can only be used with real-time speech to text and speech translation. Batch transcription only supports language detection for default base models.
This sample shows how to use language detection with a custom endpoint. If the detected language is en-US
, the example uses the default model. If the detected language is fr-FR
, the example uses the custom model endpoint. For more information, see Deploy a custom speech model.
var sourceLanguageConfigs = new SourceLanguageConfig[]
{
SourceLanguageConfig.FromLanguage("en-US"),
SourceLanguageConfig.FromLanguage("fr-FR", "The Endpoint Id for custom model of fr-FR")
};
var autoDetectSourceLanguageConfig =
AutoDetectSourceLanguageConfig.FromSourceLanguageConfigs(
sourceLanguageConfigs);
This sample shows how to use language detection with a custom endpoint. If the detected language is en-US
, the example uses the default model. If the detected language is fr-FR
, the example uses the custom model endpoint. For more information, see Deploy a custom speech model.
std::vector<std::shared_ptr<SourceLanguageConfig>> sourceLanguageConfigs;
sourceLanguageConfigs.push_back(
SourceLanguageConfig::FromLanguage("en-US"));
sourceLanguageConfigs.push_back(
SourceLanguageConfig::FromLanguage("fr-FR", "The Endpoint Id for custom model of fr-FR"));
auto autoDetectSourceLanguageConfig =
AutoDetectSourceLanguageConfig::FromSourceLanguageConfigs(
sourceLanguageConfigs);
This sample shows how to use language detection with a custom endpoint. If the detected language is en-US
, the example uses the default model. If the detected language is fr-FR
, the example uses the custom model endpoint. For more information, see Deploy a custom speech model.
List sourceLanguageConfigs = new ArrayList<SourceLanguageConfig>();
sourceLanguageConfigs.add(
SourceLanguageConfig.fromLanguage("en-US"));
sourceLanguageConfigs.add(
SourceLanguageConfig.fromLanguage("fr-FR", "The Endpoint Id for custom model of fr-FR"));
AutoDetectSourceLanguageConfig autoDetectSourceLanguageConfig =
AutoDetectSourceLanguageConfig.fromSourceLanguageConfigs(
sourceLanguageConfigs);
This sample shows how to use language detection with a custom endpoint. If the detected language is en-US
, the example uses the default model. If the detected language is fr-FR
, the example uses the custom model endpoint. For more information, see Deploy a custom speech model.
en_language_config = speechsdk.languageconfig.SourceLanguageConfig("en-US")
fr_language_config = speechsdk.languageconfig.SourceLanguageConfig("fr-FR", "The Endpoint Id for custom model of fr-FR")
auto_detect_source_language_config = speechsdk.languageconfig.AutoDetectSourceLanguageConfig(
sourceLanguageConfigs=[en_language_config, fr_language_config])
This sample shows how to use language detection with a custom endpoint. If the detected language is en-US
, the example uses the default model. If the detected language is fr-FR
, the example uses the custom model endpoint. For more information, see Deploy a custom speech model.
SPXSourceLanguageConfiguration* enLanguageConfig = [[SPXSourceLanguageConfiguration alloc]init:@"en-US"];
SPXSourceLanguageConfiguration* frLanguageConfig = \
[[SPXSourceLanguageConfiguration alloc]initWithLanguage:@"fr-FR"
endpointId:@"The Endpoint Id for custom model of fr-FR"];
NSArray *languageConfigs = @[enLanguageConfig, frLanguageConfig];
SPXAutoDetectSourceLanguageConfiguration* autoDetectSourceLanguageConfig = \
[[SPXAutoDetectSourceLanguageConfiguration alloc]initWithSourceLanguageConfigurations:languageConfigs];
var enLanguageConfig = SpeechSDK.SourceLanguageConfig.fromLanguage("en-US");
var frLanguageConfig = SpeechSDK.SourceLanguageConfig.fromLanguage("fr-FR", "The Endpoint Id for custom model of fr-FR");
var autoDetectSourceLanguageConfig = SpeechSDK.AutoDetectSourceLanguageConfig.fromSourceLanguageConfigs([enLanguageConfig, frLanguageConfig]);
Run speech translation
Use Speech translation when you need to identify the language in an audio source and then translate it to another language. For more information, see Speech translation overview.
Note
Speech translation with language identification is only supported with Speech SDKs in C#, C++, JavaScript, and Python.
Currently for speech translation with language identification, you must create a SpeechConfig from the wss://{region}.stt.speech.azure.cn/speech/universal/v2
endpoint string, as shown in code examples. In a future SDK release you won't need to set it.
See more examples of speech translation with language identification on GitHub.
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using Microsoft.CognitiveServices.Speech.Translation;
public static async Task RecognizeOnceSpeechTranslationAsync()
{
var region = "YourServiceRegion";
// Currently the v2 endpoint is required. In a future SDK release you won't need to set it.
var endpointString = $"wss://{region}.stt.speech.azure.cn/speech/universal/v2";
var endpointUrl = new Uri(endpointString);
var config = SpeechTranslationConfig.FromEndpoint(endpointUrl, "YourSubscriptionKey");
// Source language is required, but currently ignored.
string fromLanguage = "en-US";
speechTranslationConfig.SpeechRecognitionLanguage = fromLanguage;
speechTranslationConfig.AddTargetLanguage("de");
speechTranslationConfig.AddTargetLanguage("fr");
var autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig.FromLanguages(new string[] { "en-US", "de-DE", "zh-cn" });
using var audioConfig = AudioConfig.FromDefaultMicrophoneInput();
using (var recognizer = new TranslationRecognizer(
speechTranslationConfig,
autoDetectSourceLanguageConfig,
audioConfig))
{
Console.WriteLine("Say something or read from file...");
var result = await recognizer.RecognizeOnceAsync().ConfigureAwait(false);
if (result.Reason == ResultReason.TranslatedSpeech)
{
var lidResult = result.Properties.GetProperty(PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult);
Console.WriteLine($"RECOGNIZED in '{lidResult}': Text={result.Text}");
foreach (var element in result.Translations)
{
Console.WriteLine($" TRANSLATED into '{element.Key}': {element.Value}");
}
}
}
}
See more examples of speech translation with language identification on GitHub.
auto region = "YourServiceRegion";
// Currently the v2 endpoint is required. In a future SDK release you won't need to set it.
auto endpointString = std::format("wss://{}.stt.speech.azure.cn/speech/universal/v2", region);
auto config = SpeechTranslationConfig::FromEndpoint(endpointString, "YourSubscriptionKey");
auto autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig::FromLanguages({ "en-US", "de-DE" });
// Sets source and target languages
// The source language will be detected by the language detection feature.
// However, the SpeechRecognitionLanguage still need to set with a locale string, but it will not be used as the source language.
// This will be fixed in a future version of Speech SDK.
auto fromLanguage = "en-US";
config->SetSpeechRecognitionLanguage(fromLanguage);
config->AddTargetLanguage("de");
config->AddTargetLanguage("fr");
// Creates a translation recognizer using microphone as audio input.
auto recognizer = TranslationRecognizer::FromConfig(config, autoDetectSourceLanguageConfig);
cout << "Say something...\n";
// Starts translation, and returns after a single utterance is recognized. The end of a
// single utterance is determined by listening for silence at the end or until a maximum of 15
// seconds of audio is processed. The task returns the recognized text as well as the translation.
// Note: Since RecognizeOnceAsync() returns only a single utterance, it is suitable only for single
// shot recognition like command or query.
// For long-running multi-utterance recognition, use StartContinuousRecognitionAsync() instead.
auto result = recognizer->RecognizeOnceAsync().get();
// Checks result.
if (result->Reason == ResultReason::TranslatedSpeech)
{
cout << "RECOGNIZED: Text=" << result->Text << std::endl;
for (const auto& it : result->Translations)
{
cout << "TRANSLATED into '" << it.first.c_str() << "': " << it.second.c_str() << std::endl;
}
}
else if (result->Reason == ResultReason::RecognizedSpeech)
{
cout << "RECOGNIZED: Text=" << result->Text << " (text could not be translated)" << std::endl;
}
else if (result->Reason == ResultReason::NoMatch)
{
cout << "NOMATCH: Speech could not be recognized." << std::endl;
}
else if (result->Reason == ResultReason::Canceled)
{
auto cancellation = CancellationDetails::FromResult(result);
cout << "CANCELED: Reason=" << (int)cancellation->Reason << std::endl;
if (cancellation->Reason == CancellationReason::Error)
{
cout << "CANCELED: ErrorCode=" << (int)cancellation->ErrorCode << std::endl;
cout << "CANCELED: ErrorDetails=" << cancellation->ErrorDetails << std::endl;
cout << "CANCELED: Did you set the speech resource key and region values?" << std::endl;
}
}
See more examples of speech translation with language identification on GitHub.
import azure.cognitiveservices.speech as speechsdk
import time
import json
speech_key, service_region = "YourSubscriptionKey","YourServiceRegion"
weatherfilename="en-us_zh-cn.wav"
# set up translation parameters: source language and target languages
# Currently the v2 endpoint is required. In a future SDK release you won't need to set it.
endpoint_string = "wss://{}.stt.speech.azure.cn/speech/universal/v2".format(service_region)
translation_config = speechsdk.translation.SpeechTranslationConfig(
subscription=speech_key,
endpoint=endpoint_string,
speech_recognition_language='en-US',
target_languages=('de', 'fr'))
audio_config = speechsdk.audio.AudioConfig(filename=weatherfilename)
# Specify the AutoDetectSourceLanguageConfig, which defines the number of possible languages
auto_detect_source_language_config = speechsdk.languageconfig.AutoDetectSourceLanguageConfig(languages=["en-US", "de-DE", "zh-cn"])
# Creates a translation recognizer using and audio file as input.
recognizer = speechsdk.translation.TranslationRecognizer(
translation_config=translation_config,
audio_config=audio_config,
auto_detect_source_language_config=auto_detect_source_language_config)
# Starts translation, and returns after a single utterance is recognized. The end of a
# single utterance is determined by listening for silence at the end or until a maximum of 15
# seconds of audio is processed. The task returns the recognition text as result.
# Note: Since recognize_once() returns only a single utterance, it is suitable only for single
# shot recognition like command or query.
# For long-running multi-utterance recognition, use start_continuous_recognition() instead.
result = recognizer.recognize_once()
# Check the result
if result.reason == speechsdk.ResultReason.TranslatedSpeech:
print("""Recognized: {}
German translation: {}
French translation: {}""".format(
result.text, result.translations['de'], result.translations['fr']))
elif result.reason == speechsdk.ResultReason.RecognizedSpeech:
print("Recognized: {}".format(result.text))
detectedSrcLang = result.properties[speechsdk.PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult]
print("Detected Language: {}".format(detectedSrcLang))
elif result.reason == speechsdk.ResultReason.NoMatch:
print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
print("Translation canceled: {}".format(result.cancellation_details.reason))
if result.cancellation_details.reason == speechsdk.CancellationReason.Error:
print("Error details: {}".format(result.cancellation_details.error_details))