使用发音评估

项目
04/01/2024

在本文中，你将了解如何通过语音 SDK 使用语音转文本来评估发音。发音评估可以评估语音发音，并为说话人提供有关讲话音频准确度和流利度的反馈。

在流式处理模式下使用发音评估

发音评估支持不间断的流式处理模式。通过使用语音 SDK，录制时间可以不受限制。只要不停止录制，评估过程就不会结束，你可以方便地暂停和恢复评估。

有关发音评估的可用性的信息，请参阅支持的语言和可用性区域。

基线是，对于即用即付定价，发音评估的使用费率与语音转文本的费用相同。有关详细信息，请参阅定价。

有关如何在自己的应用程序中使用流式处理模式下的发音评估，请参阅示例代码。

设置配置参数

注意

发音评估在适用于 GO 的语音 SDK 中不可用。你可以阅读本指南中的概念。为解决方案选择其他编程语言。

在 SpeechRecognizer 中，可以指定要学习或练习以改进发音的语言。默认区域设置为 en-US。若要了解如何在自己的应用程序中指定用于发音评估的学习语言，请参阅示例代码。

提示

如果不确定要为具有多个区域设置的语言设置哪个区域设置，请分别尝试每个区域设置。例如，对于西班牙语，请尝试 es-ES 和 es-MX。确定你的场景中哪个区域设置评分更高。

必须创建PronunciationAssessmentConfig对象。可以设置 EnableProsodyAssessment 和 EnableContentAssessmentWithTopic 以启用韵律评估和内容评估。有关详细信息，请参阅配置方法。

var pronunciationAssessmentConfig = new PronunciationAssessmentConfig( 
    referenceText: "", 
    gradingSystem: GradingSystem.HundredMark,  
    granularity: Granularity.Phoneme,  
    enableMiscue: false); 
pronunciationAssessmentConfig.EnableProsodyAssessment(); 
pronunciationAssessmentConfig.EnableContentAssessmentWithTopic("greeting");

auto pronunciationConfig = PronunciationAssessmentConfig::Create("", PronunciationAssessmentGradingSystem::HundredMark, PronunciationAssessmentGranularity::Phoneme, false); 
pronunciationConfig->EnableProsodyAssessment(); 
pronunciationConfig->EnableContentAssessmentWithTopic("greeting");

PronunciationAssessmentConfig pronunciationConfig = new PronunciationAssessmentConfig("", 
    PronunciationAssessmentGradingSystem.HundredMark, PronunciationAssessmentGranularity.Phoneme, false); 
pronunciationConfig.enableProsodyAssessment(); 
pronunciationConfig.enableContentAssessmentWithTopic("greeting");

pronunciation_config = speechsdk.PronunciationAssessmentConfig( 
    reference_text="", 
    grading_system=speechsdk.PronunciationAssessmentGradingSystem.HundredMark, 
    granularity=speechsdk.PronunciationAssessmentGranularity.Phoneme, 
    enable_miscue=False) 
pronunciation_config.enable_prosody_assessment() 
pronunciation_config.enable_content_assessment_with_topic("greeting")

var pronunciationAssessmentConfig = new sdk.PronunciationAssessmentConfig( 
    referenceText: "", 
    gradingSystem: sdk.PronunciationAssessmentGradingSystem.HundredMark,  
    granularity: sdk.PronunciationAssessmentGranularity.Phoneme,  
    enableMiscue: false); 
pronunciationAssessmentConfig.enableProsodyAssessment(); 
pronunciationAssessmentConfig.enableContentAssessmentWithTopic("greeting");

SPXPronunciationAssessmentConfiguration *pronunicationConfig = 
[[SPXPronunciationAssessmentConfiguration alloc] init:@"" gradingSystem:SPXPronunciationAssessmentGradingSystem_HundredMark granularity:SPXPronunciationAssessmentGranularity_Phoneme enableMiscue:false]; 
[pronunicationConfig enableProsodyAssessment]; 
[pronunicationConfig enableContentAssessmentWithTopic:@"greeting"];

let pronAssessmentConfig = try! SPXPronunciationAssessmentConfiguration("", 
    gradingSystem: .hundredMark, 
    granularity: .phoneme, 
    enableMiscue: false) 
pronAssessmentConfig.enableProsodyAssessment() 
pronAssessmentConfig.enableContentAssessment(withTopic: "greeting")

下表列出了发音评估的部分关键配置参数。

参数	说明
`ReferenceText`	用来对发音进行评估的文本。 `ReferenceText` 参数是可选的。如果要为阅读语言学习场景运行脚本化评估，请设置参考文本。如果要运行未脚本化评估，不要设置引用文本。有关脚本化评估与非脚本化评估之间的定价差异，请参阅定价。
`GradingSystem`	用于分数校准的分数系统。 `FivePoint` 给出 0-5 浮点分数。 `HundredMark` 给出 0-100 浮点分数。默认：`FivePoint`。
`Granularity`	确定评估粒度的最低级别。返回大于或等于最小值的级别分数。接受的值为 `Phoneme`（显示全文、单词、音节和音素级别的分数）、`Word`（显示全文和单词级别的分数）或 `FullText`（只显示全文级别的分数）。提供的完整引用文本可以是单词、句子或段落。具体取决于输入引用文本。默认：`Phoneme`。
`EnableMiscue`	将发音的字与引用文本进行比较时，启用误读计算。启用误读是可选的。如果此值为 `True`，则可以根据比较将 `ErrorType` 结果值设置为 `Omission` 或 `Insertion`。值为 `False` 和 `True`。默认：`False`。要启用错误计算，请将 `EnableMiscue` 设置为 `True`。可以参考表下面的代码片段。
`ScenarioId`	一个 GUID，表示自定义分数系统。

配置方法

下表列出了可为 PronunciationAssessmentConfig 对象设置的一些可选方法。

注意

内容和韵律评估仅在 en-US 区域设置中提供。

若要浏览内容和韵律评估，请升级到 SDK 版本 1.35.0 或更高版本。

方法说明

EnableProsodyAssessment 为发音评估启用韵律评估。此功能评估重音、语调、语速和节奏等方面。此功能可让你深入了解语音的自然性和表现力。

启用韵律评估是可选操作。如果调用此方法，将返回 ProsodyScore 结果值。

EnableContentAssessmentWithTopic 启用内容评估。内容评估是口语学习场景的未脚本化评估的一部分。通过提供说明，可以增强评估对谈论的特定主题的理解。例如，在 C# 调用 pronunciationAssessmentConfig.EnableContentAssessmentWithTopic("greeting"); 中。可以将“greeting”替换为所需的文本来描述主题。描述没有长度限制，目前仅支持 en-US 区域设置。

方法	说明
`EnableProsodyAssessment`	为发音评估启用韵律评估。此功能评估重音、语调、语速和节奏等方面。此功能可让你深入了解语音的自然性和表现力。启用韵律评估是可选操作。如果调用此方法，将返回 `ProsodyScore` 结果值。
`EnableContentAssessmentWithTopic`	启用内容评估。内容评估是口语学习场景的未脚本化评估的一部分。通过提供说明，可以增强评估对谈论的特定主题的理解。例如，在 C# 调用 `pronunciationAssessmentConfig.EnableContentAssessmentWithTopic("greeting");` 中。可以将“greeting”替换为所需的文本来描述主题。描述没有长度限制，目前仅支持 `en-US` 区域设置。

获取发音评估结果

当语音被识别时，你可以将发音评估结果请求为 SDK 对象或 JSON 字符串。

using (var speechRecognizer = new SpeechRecognizer(
    speechConfig,
    audioConfig))
{
    pronunciationAssessmentConfig.ApplyTo(speechRecognizer);
    var speechRecognitionResult = await speechRecognizer.RecognizeOnceAsync();

    // The pronunciation assessment result as a Speech SDK object
    var pronunciationAssessmentResult =
        PronunciationAssessmentResult.FromResult(speechRecognitionResult);

    // The pronunciation assessment result as a JSON string
    var pronunciationAssessmentResultJson = speechRecognitionResult.Properties.GetProperty(PropertyId.SpeechServiceResponse_JsonResult);
}

无法使用带有适用于 C++ 的语音 SDK 的 SDK 对象获得单词、音节和音素结果。单词、音节和音素结果仅以 JSON 字符串形式提供。

auto speechRecognizer = SpeechRecognizer::FromConfig(
    speechConfig,
    audioConfig);

pronunciationAssessmentConfig->ApplyTo(speechRecognizer);
speechRecognitionResult = speechRecognizer->RecognizeOnceAsync().get();

// The pronunciation assessment result as a Speech SDK object
auto pronunciationAssessmentResult =
    PronunciationAssessmentResult::FromResult(speechRecognitionResult);

// The pronunciation assessment result as a JSON string
auto pronunciationAssessmentResultJson = speechRecognitionResult->Properties.GetProperty(PropertyId::SpeechServiceResponse_JsonResult);

若要了解如何在自己的应用程序中指定用于发音评估的学习语言，请参阅示例代码。

对于 Android 应用程序开发，单词、音节和音素结果可使用适用于 Java 的语音 SDK 的 SDK 对象获得。这些结果也以 JSON 字符串形式提供。对于 Java 运行时 (JRE) 应用程序开发，单词、音节和音素结果仅以 JSON 字符串形式提供。

SpeechRecognizer speechRecognizer = new SpeechRecognizer(
    speechConfig,
    audioConfig);

pronunciationAssessmentConfig.applyTo(speechRecognizer);
Future<SpeechRecognitionResult> future = speechRecognizer.recognizeOnceAsync();
SpeechRecognitionResult speechRecognitionResult = future.get(30, TimeUnit.SECONDS);

// The pronunciation assessment result as a Speech SDK object
PronunciationAssessmentResult pronunciationAssessmentResult =
    PronunciationAssessmentResult.fromResult(speechRecognitionResult);

// The pronunciation assessment result as a JSON string
String pronunciationAssessmentResultJson = speechRecognitionResult.getProperties().getProperty(PropertyId.SpeechServiceResponse_JsonResult);

recognizer.close();
speechConfig.close();
audioConfig.close();
pronunciationAssessmentConfig.close();
speechRecognitionResult.close();

var speechRecognizer = SpeechSDK.SpeechRecognizer.FromConfig(speechConfig, audioConfig);

pronunciationAssessmentConfig.applyTo(speechRecognizer);

speechRecognizer.recognizeOnceAsync((speechRecognitionResult: SpeechSDK.SpeechRecognitionResult) => {
    // The pronunciation assessment result as a Speech SDK object
    var pronunciationAssessmentResult = SpeechSDK.PronunciationAssessmentResult.fromResult(speechRecognitionResult);

    // The pronunciation assessment result as a JSON string
    var pronunciationAssessmentResultJson = speechRecognitionResult.properties.getProperty(SpeechSDK.PropertyId.SpeechServiceResponse_JsonResult);
},
{});

若要了解如何在自己的应用程序中指定用于发音评估的学习语言，请参阅示例代码。

speech_recognizer = speechsdk.SpeechRecognizer(
        speech_config=speech_config, \
        audio_config=audio_config)

pronunciation_assessment_config.apply_to(speech_recognizer)
speech_recognition_result = speech_recognizer.recognize_once()

# The pronunciation assessment result as a Speech SDK object
pronunciation_assessment_result = speechsdk.PronunciationAssessmentResult(speech_recognition_result)

# The pronunciation assessment result as a JSON string
pronunciation_assessment_result_json = speech_recognition_result.properties.get(speechsdk.PropertyId.SpeechServiceResponse_JsonResult)

若要了解如何在自己的应用程序中指定用于发音评估的学习语言，请参阅示例代码。

SPXSpeechRecognizer* speechRecognizer = \
        [[SPXSpeechRecognizer alloc] initWithSpeechConfiguration:speechConfig
                                              audioConfiguration:audioConfig];

[pronunciationAssessmentConfig applyToRecognizer:speechRecognizer];

SPXSpeechRecognitionResult *speechRecognitionResult = [speechRecognizer recognizeOnce];

// The pronunciation assessment result as a Speech SDK object
SPXPronunciationAssessmentResult* pronunciationAssessmentResult = [[SPXPronunciationAssessmentResult alloc] init:speechRecognitionResult];

// The pronunciation assessment result as a JSON string
NSString* pronunciationAssessmentResultJson = [speechRecognitionResult.properties getPropertyByName:SPXSpeechServiceResponseJsonResult];

若要了解如何在自己的应用程序中指定用于发音评估的学习语言，请参阅示例代码。

let speechRecognizer = try! SPXSpeechRecognizer(speechConfiguration: speechConfig, audioConfiguration: audioConfig)

try! pronConfig.apply(to: speechRecognizer)

let speechRecognitionResult = try? speechRecognizer.recognizeOnce()

// The pronunciation assessment result as a Speech SDK object
let pronunciationAssessmentResult = SPXPronunciationAssessmentResult(speechRecognitionResult!)

// The pronunciation assessment result as a JSON string
let pronunciationAssessmentResultJson = speechRecognitionResult!.properties?.getPropertyBy(SPXPropertyId.speechServiceResponseJsonResult)

结果参数

根据使用的是脚本化还是未脚本化评估，可以获取不同的发音评估结果。脚本化评估适用于阅读语言学习场景。非脚本化评估适用于口语学习场景。

注意

有关脚本化评估与非脚本化评估之间的定价差异，请参阅定价。

脚本化评估结果

下表列出了脚本化评估或阅读场景的一些关键发音评估结果。

参数	说明	粒度
`AccuracyScore`	语音的发音准确度。准确度表示音素与母语人士发音的接近程度。音节、单词和全文准确度分数由音素级别的准确度分数聚合而来，并且根据评估目标进行了优化。	音素水平，音节水平（仅限 zh-CN），词汇水平，全文水平
`FluencyScore`	给定语音的流畅度。流畅度表示语音与母语人士在单词之间使用无声停顿的接近程度。	全文水平
`CompletenessScore`	语音的完整性，按发音单词与输入引用文本的比例进行计算。	全文水平
`ProsodyScore`	给定语音的韵律。韵律表示给定的语音有多么自然，包括重音、语调、语速和节奏。	全文水平
`PronScore`	总分，表示给定语音的发音质量。 `PronScore` 按权重从 `AccuracyScore`、`FluencyScore` 和 `CompletenessScore` 聚合而成。	全文水平
`ErrorType`	此值表示与引用文本相比的错误类型。选项包括单词是否被省略、插入或不正确地插入断句。它还表示缺少标点。它还指示单词的发音是否糟糕，或者话语中单调上升、下降或平淡。可能的值为 `None`（表示此词没有错误）、`Omission`、`Insertion`、`Mispronunciation`、`UnexpectedBreak`、`MissingBreak`和`Monotone`。当单词的发音 `AccuracyScore` 低于 60 时，错误类型可能是 `Mispronunciation`。	词汇水平

未脚本化评估结果

下表列出了非脚本化评估或口语场景的一些关键发音评估结果。

VocabularyScore、GrammarScore 和 TopicScore 参数汇总到合并的内容评估中。

注意

内容和韵律评估仅在 en-US 区域设置中提供。

响应参数	说明	粒度
`AccuracyScore`	语音的发音准确度。准确度表示音素与母语人士发音的接近程度。音节、单词和全文准确性分数由音素级别的准确度分数聚合而来，并根据评估目标进行调整。	音素水平，音节水平（仅限 zh-CN），词汇水平，全文水平
`FluencyScore`	给定语音的流畅度。流畅度表示语音与母语人士在单词之间使用无声停顿的接近程度。	全文水平
`ProsodyScore`	给定语音的韵律。韵律表示给定的语音有多么自然，包括重音、语调、语速和节奏。	全文水平
`VocabularyScore`	词汇用法的熟练程度。它评估演讲者对单词的有效使用及其在给定上下文中准确表达想法的适当性，以及词汇复杂性水平。	全文水平
`GrammarScore`	使用语法和各种句子模式的正确性。语法错误由词汇准确性、语法准确性和句子结构的多样性共同提升。	全文水平
`TopicScore`	对主题的理解和参与程度，这提供了对演讲者有效表达思想和想法的能力以及参与主题的能力的见解。	全文水平
`PronScore`	总分，表示给定语音的发音质量。此值按权重从 `AccuracyScore`、`FluencyScore` 和 `CompletenessScore` 聚合而成。	全文水平
`ErrorType`	单词发音错误、断句插入不正确或缺少标点。它还指示发音是语句中单调上升、下降或平淡。可能的值为 `None`（表示此词没有错误）、`Mispronunciation`、`UnexpectedBreak`、`MissingBreak`和`Monotone`。	词汇水平

下表更详细地描述了韵律评估结果：

字段	说明
`ProsodyScore`	整个话语的韵律分数。
`Feedback`	有关词汇水平的反馈，包括`Break`和`Intonation`。
`Break`
`ErrorTypes`	与断句相关的错误类型，包括`UnexpectedBreak`和`MissingBreak`。在当前版本中，不提供断句错误类型。需要在 `UnexpectedBreak - Confidence` 和 `MissingBreak - confidence` 字段上设置阈值，以确定单词之前是否存在意外断句或缺少断句。
`UnexpectedBreak`	指示单词前面出现意外断句。
`MissingBreak`	指示单词前面缺少断句。
`Thresholds`	两个置信度分数的建议阈值为 0.75。这意味着，如果 `UnexpectedBreak - Confidence` 的值大于 0.75，则会出现意外断句。如果 `MissingBreak - confidence` 的值大于 0.75，则缺少断句。如果希望对这两个断句具有可变的检测灵敏度，可以为 `UnexpectedBreak - Confidence` 和 `MissingBreak - Confidence` 字段分配不同的阈值。
`Intonation`	指示语音中的语调。
`ErrorTypes`	与语调相关的错误类型，目前仅支持单调。如果`Monotone`存在于字段 `ErrorTypes` 中，话语检测为单调。在整个话语中检测到单调，但标记会分配给所有单词。同一话语中的所有单词共享相同的单调检测信息。
`Monotone`	指示单调语音。
`Thresholds (Monotone Confidence)`	`Monotone - SyllablePitchDeltaConfidence` 字段保留用于用户自定义单调检测。如果对提供的单调决策不满意，可以调整这些字段的阈值以根据偏好自定义检测。

JSON 结果示例

语音单词“hello”的脚本化发音评估结果显示为以下示例中的 JSON 字符串。

音素字母是 IPA。
音节与相同单词的音素一起返回。
可使用 Offset 和 Duration 值将音节与其对应的音素对齐。例如，第二个音节 loʊ 的起始偏移量 (11700000) 与第三个音素 l 对齐。偏移量表示已识别的语音在音频流中开始的时间。此值以 100 纳秒为单位衡量。若要详细了解 Offset 和 Duration，请参阅响应属性。
有五个 NBestPhonemes 对应于请求的语音音素的数量。
在 Phonemes 中，最可能的口语音素是 ə，而不是预期的音素 ɛ。预期的音素 ɛ 仅获得 47 分的置信度。其他潜在匹配的置信度分数为 52、17 和 2。

{
    "Id": "bbb42ea51bdb46d19a1d685e635fe173",
    "RecognitionStatus": 0,
    "Offset": 7500000,
    "Duration": 13800000,
    "DisplayText": "Hello.",
    "NBest": [
        {
            "Confidence": 0.975003,
            "Lexical": "hello",
            "ITN": "hello",
            "MaskedITN": "hello",
            "Display": "Hello.",
            "PronunciationAssessment": {
                "AccuracyScore": 100,
                "FluencyScore": 100,
                "CompletenessScore": 100,
                "PronScore": 100
            },
            "Words": [
                {
                    "Word": "hello",
                    "Offset": 7500000,
                    "Duration": 13800000,
                    "PronunciationAssessment": {
                        "AccuracyScore": 99.0,
                        "ErrorType": "None"
                    },
                    "Syllables": [
                        {
                            "Syllable": "hɛ",
                            "PronunciationAssessment": {
                                "AccuracyScore": 91.0
                            },
                            "Offset": 7500000,
                            "Duration": 4100000
                        },
                        {
                            "Syllable": "loʊ",
                            "PronunciationAssessment": {
                                "AccuracyScore": 100.0
                            },
                            "Offset": 11700000,
                            "Duration": 9600000
                        }
                    ],
                    "Phonemes": [
                        {
                            "Phoneme": "h",
                            "PronunciationAssessment": {
                                "AccuracyScore": 98.0,
                                "NBestPhonemes": [
                                    {
                                        "Phoneme": "h",
                                        "Score": 100.0
                                    },
                                    {
                                        "Phoneme": "oʊ",
                                        "Score": 52.0
                                    },
                                    {
                                        "Phoneme": "ə",
                                        "Score": 35.0
                                    },
                                    {
                                        "Phoneme": "k",
                                        "Score": 23.0
                                    },
                                    {
                                        "Phoneme": "æ",
                                        "Score": 20.0
                                    }
                                ]
                            },
                            "Offset": 7500000,
                            "Duration": 3500000
                        },
                        {
                            "Phoneme": "ɛ",
                            "PronunciationAssessment": {
                                "AccuracyScore": 47.0,
                                "NBestPhonemes": [
                                    {
                                        "Phoneme": "ə",
                                        "Score": 100.0
                                    },
                                    {
                                        "Phoneme": "l",
                                        "Score": 52.0
                                    },
                                    {
                                        "Phoneme": "ɛ",
                                        "Score": 47.0
                                    },
                                    {
                                        "Phoneme": "h",
                                        "Score": 17.0
                                    },
                                    {
                                        "Phoneme": "æ",
                                        "Score": 2.0
                                    }
                                ]
                            },
                            "Offset": 11100000,
                            "Duration": 500000
                        },
                        {
                            "Phoneme": "l",
                            "PronunciationAssessment": {
                                "AccuracyScore": 100.0,
                                "NBestPhonemes": [
                                    {
                                        "Phoneme": "l",
                                        "Score": 100.0
                                    },
                                    {
                                        "Phoneme": "oʊ",
                                        "Score": 46.0
                                    },
                                    {
                                        "Phoneme": "ə",
                                        "Score": 5.0
                                    },
                                    {
                                        "Phoneme": "ɛ",
                                        "Score": 3.0
                                    },
                                    {
                                        "Phoneme": "u",
                                        "Score": 1.0
                                    }
                                ]
                            },
                            "Offset": 11700000,
                            "Duration": 1100000
                        },
                        {
                            "Phoneme": "oʊ",
                            "PronunciationAssessment": {
                                "AccuracyScore": 100.0,
                                "NBestPhonemes": [
                                    {
                                        "Phoneme": "oʊ",
                                        "Score": 100.0
                                    },
                                    {
                                        "Phoneme": "d",
                                        "Score": 29.0
                                    },
                                    {
                                        "Phoneme": "t",
                                        "Score": 24.0
                                    },
                                    {
                                        "Phoneme": "n",
                                        "Score": 22.0
                                    },
                                    {
                                        "Phoneme": "l",
                                        "Score": 18.0
                                    }
                                ]
                            },
                            "Offset": 12900000,
                            "Duration": 8400000
                        }
                    ]
                }
            ]
        }
    ]
}

可获取以下项的发音评估分数：

全文
单词
音节组
SAPI 或 IPA 格式的音素

每个区域设置支持的功能

下表汇总了区域设置支持的功能。有关详细信息，请参阅以下部分。

音素字母格式	IPA	SAPI
音素名称	`en-US`	`en-US`、`zh-cn`
音节组	`en-US`	`en-US`
口语音素	`en-US`	`en-US`

音节组

发音评估可以提供音节级的评估结果。单词通常是逐个音节发音，而不是逐个音素发音。按音节分组更清晰，也更符合说话习惯。

发音评估仅通过 IPA 和 SAPI 支持 en-US 中的音节组合。

下表将示例音素与相应的音节进行了比较。

示例单词	音素	音节
技术	teknələdʒɪkl	tek·nə·lɑ·dʒɪkl
hello	hɛloʊ	hɛ·loʊ
luck	lʌk	lʌk
photosynthesis	foʊtəsɪnлəsɪs	foʊ·tə·sɪn·θə·sɪs

要请求音节级结果和音素，请将粒度配置参数设置为 Phoneme。

音素字母格式

发音评估通过 IPA 支持 en-US 中的音素名称，并通过 SAPI 支持 en-US 和 zh-cn 中的音素名称。

对于支持音素名称的区域设置，音素名称与分数一起提供。音素名称有助于识别哪些音素发音准确，哪些不准确。对于其他区域设置，只能获取音素分数。

下表将示例 SAPI 音素与相应的 IPA 音素进行了比较。

示例单词	SAPI 音素	IPA 音素
hello	h eh l ow	h ɛ l oʊ
luck	l ah k	l ʌ k
photosynthesis	f ow t ax s ih n th ax s ih s	f oʊ t ə s ɪ n θ ə s ɪ s

要请求 IPA 音素，请将音素字母设置为 IPA。如果不指定字母表，则默认情况下音素为 SAPI 格式。

pronunciationAssessmentConfig.PhonemeAlphabet = "IPA";

auto pronunciationAssessmentConfig = PronunciationAssessmentConfig::CreateFromJson("{\"referenceText\":\"good morning\",\"gradingSystem\":\"HundredMark\",\"granularity\":\"Phoneme\",\"phonemeAlphabet\":\"IPA\"}");

PronunciationAssessmentConfig pronunciationAssessmentConfig = PronunciationAssessmentConfig.fromJson("{\"referenceText\":\"good morning\",\"gradingSystem\":\"HundredMark\",\"granularity\":\"Phoneme\",\"phonemeAlphabet\":\"IPA\"}");

pronunciation_assessment_config = speechsdk.PronunciationAssessmentConfig(json_string="{\"referenceText\":\"good morning\",\"gradingSystem\":\"HundredMark\",\"granularity\":\"Phoneme\",\"phonemeAlphabet\":\"IPA\"}")

var pronunciationAssessmentConfig = SpeechSDK.PronunciationAssessmentConfig.fromJSON("{\"referenceText\":\"good morning\",\"gradingSystem\":\"HundredMark\",\"granularity\":\"Phoneme\",\"phonemeAlphabet\":\"IPA\"}");

pronunciationAssessmentConfig.phonemeAlphabet = @"IPA";

pronunciationAssessmentConfig?.phonemeAlphabet = "IPA"

评估口语音素

通过口语音素，可获得表明口语音素与预期音素匹配的可能性的置信度分数。

发音评估通过 IPA 和 SAPI 支持 en-US 中的口语音素。

例如，若要获取单词 Hello 的完整口语发音，可以连接每个具有最高置信度分数的预期音素的第一个口语音素。在以下评估结果中，当你说 hello 这个词时，预期的 IPA 音素是 h ɛ l oʊ。但是，实际口语音素是 h ə l oʊ。在本示例中，每个预期的音素都有五个可能的候选项。评估结果显示最可能的口语音素是 ə 而不是预期的音素 ɛ。预期的音素 ɛ 仅获得 47 分的置信度。其他潜在匹配的置信度分数为 52、17 和 2。

{
    "Id": "bbb42ea51bdb46d19a1d685e635fe173",
    "RecognitionStatus": 0,
    "Offset": 7500000,
    "Duration": 13800000,
    "DisplayText": "Hello.",
    "NBest": [
        {
            "Confidence": 0.975003,
            "Lexical": "hello",
            "ITN": "hello",
            "MaskedITN": "hello",
            "Display": "Hello.",
            "PronunciationAssessment": {
                "AccuracyScore": 100,
                "FluencyScore": 100,
                "CompletenessScore": 100,
                "PronScore": 100
            },
            "Words": [
                {
                    "Word": "hello",
                    "Offset": 7500000,
                    "Duration": 13800000,
                    "PronunciationAssessment": {
                        "AccuracyScore": 99.0,
                        "ErrorType": "None"
                    },
                    "Syllables": [
                        {
                            "Syllable": "hɛ",
                            "PronunciationAssessment": {
                                "AccuracyScore": 91.0
                            },
                            "Offset": 7500000,
                            "Duration": 4100000
                        },
                        {
                            "Syllable": "loʊ",
                            "PronunciationAssessment": {
                                "AccuracyScore": 100.0
                            },
                            "Offset": 11700000,
                            "Duration": 9600000
                        }
                    ],
                    "Phonemes": [
                        {
                            "Phoneme": "h",
                            "PronunciationAssessment": {
                                "AccuracyScore": 98.0,
                                "NBestPhonemes": [
                                    {
                                        "Phoneme": "h",
                                        "Score": 100.0
                                    },
                                    {
                                        "Phoneme": "oʊ",
                                        "Score": 52.0
                                    },
                                    {
                                        "Phoneme": "ə",
                                        "Score": 35.0
                                    },
                                    {
                                        "Phoneme": "k",
                                        "Score": 23.0
                                    },
                                    {
                                        "Phoneme": "æ",
                                        "Score": 20.0
                                    }
                                ]
                            },
                            "Offset": 7500000,
                            "Duration": 3500000
                        },
                        {
                            "Phoneme": "ɛ",
                            "PronunciationAssessment": {
                                "AccuracyScore": 47.0,
                                "NBestPhonemes": [
                                    {
                                        "Phoneme": "ə",
                                        "Score": 100.0
                                    },
                                    {
                                        "Phoneme": "l",
                                        "Score": 52.0
                                    },
                                    {
                                        "Phoneme": "ɛ",
                                        "Score": 47.0
                                    },
                                    {
                                        "Phoneme": "h",
                                        "Score": 17.0
                                    },
                                    {
                                        "Phoneme": "æ",
                                        "Score": 2.0
                                    }
                                ]
                            },
                            "Offset": 11100000,
                            "Duration": 500000
                        },
                        {
                            "Phoneme": "l",
                            "PronunciationAssessment": {
                                "AccuracyScore": 100.0,
                                "NBestPhonemes": [
                                    {
                                        "Phoneme": "l",
                                        "Score": 100.0
                                    },
                                    {
                                        "Phoneme": "oʊ",
                                        "Score": 46.0
                                    },
                                    {
                                        "Phoneme": "ə",
                                        "Score": 5.0
                                    },
                                    {
                                        "Phoneme": "ɛ",
                                        "Score": 3.0
                                    },
                                    {
                                        "Phoneme": "u",
                                        "Score": 1.0
                                    }
                                ]
                            },
                            "Offset": 11700000,
                            "Duration": 1100000
                        },
                        {
                            "Phoneme": "oʊ",
                            "PronunciationAssessment": {
                                "AccuracyScore": 100.0,
                                "NBestPhonemes": [
                                    {
                                        "Phoneme": "oʊ",
                                        "Score": 100.0
                                    },
                                    {
                                        "Phoneme": "d",
                                        "Score": 29.0
                                    },
                                    {
                                        "Phoneme": "t",
                                        "Score": 24.0
                                    },
                                    {
                                        "Phoneme": "n",
                                        "Score": 22.0
                                    },
                                    {
                                        "Phoneme": "l",
                                        "Score": 18.0
                                    }
                                ]
                            },
                            "Offset": 12900000,
                            "Duration": 8400000
                        }
                    ]
                }
            ]
        }
    ]
}

要指示是否以及有多少潜在的口语音素可以获得置信度分数，请将 NBestPhonemeCount 参数设置为整数值，例如 5。

pronunciationAssessmentConfig.NBestPhonemeCount = 5;

auto pronunciationAssessmentConfig = PronunciationAssessmentConfig::CreateFromJson("{\"referenceText\":\"good morning\",\"gradingSystem\":\"HundredMark\",\"granularity\":\"Phoneme\",\"phonemeAlphabet\":\"IPA\",\"nBestPhonemeCount\":5}");

PronunciationAssessmentConfig pronunciationAssessmentConfig = PronunciationAssessmentConfig.fromJson("{\"referenceText\":\"good morning\",\"gradingSystem\":\"HundredMark\",\"granularity\":\"Phoneme\",\"phonemeAlphabet\":\"IPA\",\"nBestPhonemeCount\":5}");

pronunciation_assessment_config = speechsdk.PronunciationAssessmentConfig(json_string="{\"referenceText\":\"good morning\",\"gradingSystem\":\"HundredMark\",\"granularity\":\"Phoneme\",\"phonemeAlphabet\":\"IPA\",\"nBestPhonemeCount\":5}")

var pronunciationAssessmentConfig = SpeechSDK.PronunciationAssessmentConfig.fromJSON("{\"referenceText\":\"good morning\",\"gradingSystem\":\"HundredMark\",\"granularity\":\"Phoneme\",\"phonemeAlphabet\":\"IPA\",\"nBestPhonemeCount\":5}");

pronunciationAssessmentConfig.nbestPhonemeCount = 5;

pronunciationAssessmentConfig?.nbestPhonemeCount = 5

了解质量基准。
试用 Speech Studio 中的发音评估。
查看易于部署的发音评估演示。

使用发音评估

在流式处理模式下使用发音评估

设置配置参数

配置方法

获取发音评估结果

结果参数

脚本化评估结果

未脚本化评估结果

JSON 结果示例

每个区域设置支持的功能

音节组

音素字母格式

评估口语音素

相关内容

其他资源