语音服务发行说明Speech Service release notes

文本转语音 2020 年 8 月发行版Text-to-speech 2020-August release

新增功能New features

  • 神经网络 TTS:en-US Aria 语音的一种新的说话风格Neural TTS: new speaking style for en-US Aria voice. 播报新闻时,AriaNeural 听起来像新闻播音员。AriaNeural can sound like a newscaster when reading news. “newscast-formal”风格听起来更严肃,而“newscast-casual”风格则更为放松和随意。The 'newscast-formal' style sounds more serious, while the 'newscast-casual' style is more relaxed and informal. 请参阅如何在 SSML 中使用说话风格See how to use the speaking styles in SSML.

  • 音频内容创建:一组新功能,可实现更强大的语音优化和音频管理功能Audio Content Creation: a set of new features to enable more powerful voice tuning and audio management capabilities.

    • 发音:将发音优化功能更新为最新的音素集。Pronunciation: the pronunciation tuning feature is updated to the latest phoneme set. 可以从库中选取正确的音素元素,并优化所选字词的发音。You can pick the right phoneme element from the library and refine the pronunciation of the words you have selected.

    • 下载:音频“下载”/“导出”这一功能得到增强,支持按段落生成音频。Download: The audio "Download"/"Export" feature is enhanced to support generating audio by paragraph. 可以编辑同一文件/SSML 中的内容,同时生成多个音频输出。You can edit content in the same file/SSML, while generating multiple audio outputs. “下载”的文件结构也得到了完善。The file structure of "Download" is refined as well. 现在,可以轻松地将所有音频保存在一个文件夹中。Now, you can easily get all audios in one folder.

    • 任务状态:多文件导出体验得到改善。Task status : The multi-file export experience is improved. 过去导出多个文件时,如果其中一个文件失败,则整个任务将失败。When you export multiple files in the past, if one of the files has failed, the entire task will fail. 但现在,所有其余文件都将成功导出。But now, all other files will be successfully exported. 任务报表中包含了更加详细和结构化的信息。The task report is enriched with more detailed and structured information. 可以通过报表查看所有失败文件和句子的记录。You can check the logs for all failed files and sentences now with the report.

    • SSML 文档:链接到 SSML 文档,查看有关如何使用所有优化功能的规则。SSML documentation: linked to SSML document to help you check the rules for how to use all tuning features.

  • 语音列表 API 已更新,现包含用户易记的显示名称和神经网络语音支持的说话风格The Voice List API is updated to include a user friendly display name and the speaking styles supported for neural voices.

一般性的 TTS 语音质量改进General TTS voice quality improvements

  • 降低了单词级别发音错误 %:ru-RU(错误减少 56%)、sv-SE(错误减少 49%)Reduced word-level pronunciation error % for ru-RU (errors reduced by 56%) and sv-SE (errors reduced by 49%)

  • en-US 神经网络语音的复音词读取能力提高了 40%。Improved polyphony word reading on en-US neural voices by 40%. 复音词的示例包括“read”、“live”、“content”、“record”、“object”等。Examples of polyphony words include "read", "live", "content", "record", "object", etc.

  • 使 fr-FR 中的疑问语气更加自然。Improved the naturalness of the question tone in fr-FR. MOS(平均意见得分)增益:+0.28MOS (Mean Opinion Score) gain: +0.28

  • 更新了以下语音的 vocoder,提高了保真度,整体性能提高 40%。Updated the vocoders for the following voices, with fidelity improvements and overall performance speed-up by 40%.

    LocaleLocale 语音Voice
    en-GB MiaMia
    es-MX DaliaDalia
    fr-CA SylvieSylvie
    fr-FR DeniseDenise
    ja-JP NanamiNanami
    ko-KR Sun-HiSun-Hi

Bug 修复Bug fixes

  • 修复了音频内容创建工具的一些 BugFixed a number of bugs with the Audio Content Creation tool
    • 修复了自动刷新问题。Fixed issue with auto refreshing.
    • 修复了稳定性问题,包括“break”标记的导出错误和标点错误。Fixed stability issue, including an export error with the 'break' tag, and errors in punctuations.

语音 SDK 1.13.0:2020 年 7 月发行版Speech SDK 1.13.0: 2020-July release

注意:Windows 版语音 SDK 依赖于 Visual Studio 2015、2017 和 2019 的共享 Microsoft Visual C++ Redistributable。Note: The Speech SDK on Windows depends on the shared Microsoft Visual C++ Redistributable for Visual Studio 2015, 2017 and 2019. 此处下载并安装它。Download and install it from here.

新功能New features

  • JavaScript:添加了对自动语言检测/语言 ID 的支持。JavaScript: Added support for automatic language detection/language ID. 参阅此处的文档。See documentation here.
  • Python:针对 Windows 和 Linux 上的 Python 添加了压缩音频支持。Python: Added compressed audio support for Python on Windows and Linux. 参阅此处的文档。See documentation here.

Bug 修复Bug fixes

  • 全部:修复了 SendMessageAsync 在用户等待消息时不通过网络发送消息的问题。All: Fixed an issue that the SendMessageAsync does not really send the message over the wire after the users finish waiting for it.
  • JavaScript:修复了在最小化浏览器时与限制相关的一个问题JavaScript: Fixed an issue with throttling when browser is minimized.
  • JavaScript:修复了流中的一个内存泄漏问题JavaScript: Fixed an issue with a memory leak on streams.
  • JavaScript:为来自 NodeJS 的 OCSP 响应添加了缓存。JavaScript: Added caching for OCSP responses from NodeJS.
  • Java:修复了导致 BigInteger 字段总是返回 0 的问题。Java: Fixed an issue that was causing BigInteger fields to always return 0.
  • iOS:修复了在 iOS App Store 中发布基于语音 SDK 的应用时出现的一个问题iOS: Fixed an issue with publishing Speech SDK based apps in the iOS App Store.

COVID-19 缩减测试: 由于过去几周一直在远程工作,我们无法像往常那样执行那么多手动验证测试。COVID-19 abridged testing: Due to working remotely over the last few weeks, we couldn't do as much manual verification testing as we normally do. 我们没有做我们认为可能会造成任何破坏的任何更改,我们的自动化测试已全部通过。We haven't made any changes we think could have broken anything, and our automated tests all passed. 如果我们遗漏了某些内容,请在 GitHub 上告诉我们。In the unlikely event that we missed something, please let us know on GitHub.
请保重身体!Stay healthy!

文本转语音 2020 年 7 月发行版Text-to-speech 2020-July release

新增功能New features

  • 神经 TTS,15 种新的神经语音:添加到神经 TTS 组合中的新语音包括:ar-EG 阿拉伯文(埃及)中的 Salma,ar-SA 阿拉伯文(沙特阿拉伯)中的 Zariyah,ca-ES 加泰罗尼亚语(西班牙)中的 Alba,da-DK 丹麦文(丹麦)中的 Christel,es-IN 英文(印度)中的 Neerja,fi-FI 芬兰文(芬兰)中的 Noora,hi-IN 印地语(印度)中的 Swara,nl-NL 荷兰语(荷兰)中的 Colette,pl-PL 波兰文(波兰)中的 Zofia,pt-PT 葡萄牙语(葡萄牙)中的 Fernanda,ru-RU 俄语(俄罗斯)中的 Dariya,sv-SE 瑞典文(瑞典)中的 Hillevi,th-TH 泰文(泰国)中的 Achara,zh-HK 中文(广东话,繁体)中的 HiuGaai,zh-TW 中文(台湾普通话)中的 HsiaoYu。Neural TTS, 15 new neural voices: The new voices added to the Neural TTS portfolio are Salma in ar-EG Arabic (Egypt), Zariyah in ar-SA Arabic (Saudi Arabia), Alba in ca-ES Catalan (Spain), Christel in da-DK Danish (Denmark), Neerja in es-IN English (India), Noora in fi-FI Finnish (Finland), Swara in hi-IN Hindi (India), Colette in nl-NL Dutch (Netherlands), Zofia in pl-PL Polish (Poland), Fernanda in pt-PT Portuguese (Portugal), Dariya in ru-RU Russian (Russia), Hillevi in sv-SE Swedish (Sweden), Achara in th-TH Thai (Thailand), HiuGaai in zh-HK Chinese (Cantonese, Traditional) and HsiaoYu in zh-TW Chinese (Taiwanese Mandarin). 请检查所有支持的语言Check all supported languages.

  • 自定义语音,简化了训练流中的语音测试,从而简化了用户体验:使用新的测试功能,将使用针对每种语言进行了优化的预定义测试集来自动测试每个语音,以涵盖一般方案和语音助手方案。Custom Voice, streamlined voice testing with the training flow to simplify user experience: With the new testing feature, each voice will be automatically tested with a predefined test set optimized for each language to cover general and voice assistant scenarios. 这些测试集经过仔细选择并经过测试,包含语言中的典型用例和音素。These test sets are carefully selected and tested to include typical use cases and phonemes in the language. 除此之外,用户还可以选择在训练模型时上传自己的测试脚本。Besides, users can still select to upload their own test scripts when training a model.

  • 音频内容创建:发布了一组新功能,可实现更强大的语音优化和音频管理功能Audio Content Creation: a set of new features are released to enable more powerful voice tuning and audio management capabilities

    • Pitchratevolume 进行了增强,以支持使用预定义值(例如慢、中和快)进行优化的功能。Pitch, rate, and volume are enhanced to support tuning with a predefined value, like slow, medium and fast. 现在,用户可以直接选择一个“常数”值来进行音频编辑。It's now straightforward for users to pick a 'constant' value for their audio editing.

    音频优化

    • 用户现在可以查看其工作文件的 Audio historyUsers can now review the Audio history for their work file. 使用此功能,用户可以轻松地跟踪与工作文件相关的所有生成的音频。With this feature, users can easily track all the generated audio related to a working file. 他们可以检查历史版本并在优化时比较质量。They can check the history version and compare the quality while tuning at the same time.

    音频历史记录

    • Clear 功能现在更加灵活。The Clear feature is now more flexible. 用户可以清除特定的优化参数,同时保留可用于所选内容的其他参数。Users can clear a specific tuning parameter while keeping other parameters available for the selected content.

    • 登陆页面上添加了教程视频,可帮助用户快速开始使用 TTS 语音优化和音频管理。A tutorial video was added on the landing page to help users quickly get started with TTS voice tuning and audio management.

一般性的 TTS 语音质量改进General TTS voice quality improvements

  • 改进了 TTS vocoder 以提高保真度并降低延迟。Improved TTS vocoder in for higher fidelity and lower latency.

    • 已将 it-IT 中的 Elsa 更新为新的 vocoder,它实现了 +0.464 CMOS(相对平均意见分数)的语音质量提高,合成速度提高 40%,首个字节延迟降低 30%。Updated Elsa in it-IT to a new vocoder which achieved +0.464 CMOS (Comparative Mean Opinion Score) gain in voice quality, 40% faster in synthesis and 30% reduction on first byte latency.
    • 已将 zh-CN 中的 Xiaoxiao 更新为新的 vocoder,对于一般领域提高了 +0148 CMOS,对于 newscast 风格提高了 +0.348,对于 lyrical 风格提高了 +0.195。Updated Xiaoxiao in zh-CN to the new vocoder with +0148 CMOS gain for the general domain, +0.348 for the newscast style and +0.195 for the lyrical style.
  • 更新了 de-DEja-JP 语音模型,使 TTS 输出更加自然。Updated de-DE and ja-JP voice models to make the TTS output more natural.

    • 使用最新的韵律建模方法更新了 de-DE 中的 Katja,使 MOS(平均意见分数)提高了 +0.13。Updated Katja in de-DE with the latest prosody modeling method, the MOS (Mean Opinion Score) gain is +0.13.
    • 使用一种新的音高重音韵律模型更新了 ja-JP 中的 Nanami,使 MOS(平均意见分数)提高了 +0.19;Updated Nanami in ja-JP with a new pitch accent prosody model, the MOS (Mean Opinion Score) gain is +0.19;
  • 5 种语言在单词级别改进了读音准确度。Improved word-level pronunciation accuracy in 5 languages.

    语言Language 发音错误减少Pronunciation error reduction
    en-GB 51%51%
    ko-KR 17%17%
    pt-BR 39%39%
    pt-PT 77%77%
    id-ID 46%46%

Bug 修复Bug fixes

  • 货币读取Currency reading

    • 修复了 es-ESes-MX 的货币读取问题Fixed the issue with currency reading for es-ES and es-MX
    语言Language 输入Input 改进后的读出Readout after improvement
    es-MX $1.58$1.58 un peso cincuenta y ocho centavosun peso cincuenta y ocho centavos
    es-ES $1.58$1.58 un dólar cincuenta y ocho centavosun dólar cincuenta y ocho centavos
    • 支持以下区域设置中的负货币(例如“-325 €”):en-USen-GBfr-FRit-ITen-AUen-CASupport for negative currency (like “-325 €” ) in following locales: en-US, en-GB, fr-FR, it-IT, en-AU, en-CA.
  • 改进了 pt-PT 中的地址读取。Improved address reading in pt-PT.

  • 修复了单词“for”和“four”的 Natasha (en-AU) 和 Libby (en-UK) 发音问题。Fixed Natasha (en-AU) and Libby (en-UK) pronunciation issues on the word "for" and "four".

  • 修复了音频内容创建工具的 bugFixed bugs on Audio Content Creation tool

    • 修复了在第二个段落后意外发生的额外停顿。The additional and unexpected pause after the second paragraph is fixed.
    • 根据一个回归 bug 重新添加了“无中断”功能。'No break' feature is added back from a regression bug.
    • 修复了 Speech Studio 的随机刷新问题。The random refresh issue of Speech Studio is fixed.

示例/SDKSamples/SDK

  • JavaScript:修复了 FireFox 中的以及 macOS 和 iOS 上的 Safari 中的播放问题。JavaScript: Fixes playback issue in Firefox, and Safari on macOS and iOS.

语音 SDK 1.12.1:2020 年 6 月版本Speech SDK 1.12.1: 2020-June release

语音 CLI(也称为 SPX)Speech CLI (also known as SPX)

  • 添加了 CLI 内帮助搜索功能:Added in-CLI help search features:
    • spx help find --text TEXT
    • spx help find --topic NAME
  • 更新为可与新部署的 v3.0 批处理和自定义语音识别 API 配合使用:Updated to work with newly deployed v3.0 Batch and Custom Speech APIs:
    • spx help batch examples
    • spx help csr examples

Bug 修复Bug fixes

  • JavaScript:针对 FireFox 中的以及 macOS 和 iOS 上的 Safari 中的文本转语音进行了修复。JavaScript: Fixes for Text-To-Speech in Firefox, and Safari on macOS and iOS.
  • 针对使用 8 通道流时对话听录中 Windows 应用程序验证工具访问冲突崩溃的修复。Fix for Windows application verifier access violation crash on conversation transcription when using 8-channel stream.
  • 针对多设备对话翻译中 Windows 应用程序验证工具访问冲突崩溃的修复。Fix for Windows application verifier access violation crash on multi-device conversation translation.

示例Samples

COVID-19 缩减测试: 由于过去几周一直在远程工作,我们无法像往常那样执行那么多手动验证测试。COVID-19 abridged testing: Due to working remotely over the last few weeks, we couldn't do as much manual verification testing as we normally do. 我们没有做我们认为可能会造成任何破坏的任何更改,我们的自动化测试已全部通过。We haven't made any changes we think could have broken anything, and our automated tests all passed. 如果我们遗漏了某些内容,请在 GitHub 上告诉我们。In the unlikely event that we missed something, please let us know on GitHub.
请保重身体!Stay healthy!

语音 SDK 1.12.0:2020 年 5 月版本Speech SDK 1.12.0: 2020-May release

语音 CLI(也称为 SPX)Speech CLI (Also Know As SPX)

  • SPX 是新的命令行工具,可用于从命令行执行识别、合成、翻译、批量听录和自定义语音管理。SPX is a new command line tool that allows you to perform recognition, synthesis, translation, batch transcription, and custom speech management from the command line. 使用它来测试语音服务,或为需要执行的语音服务任务编写脚本。Use it to test the Speech Service, or to script the Speech Service tasks you need to perform. 下载该工具,并在此处查看文档。Download the tool and read the documentation here.

新功能New features

  • Go语音识别的新 Go 语言支持。Go: New Go language support for speech recognition. 此处设置开发环境。Set up your dev environment here. 有关示例代码,请参阅下面的“示例”部分。For sample code, see the Samples section below.
  • JavaScript:添加了对文本转语音的浏览器支持。JavaScript: Added Browser support for Text-To-Speech. 参阅此处的文档。See documentation here.
  • C++、C#、Java:Windows、Android、Linux 和 iOS 平台上支持的新 KeywordRecognizer 对象和 API。C++, C#, Java: New KeywordRecognizer object and APIs supported on Windows, Android, Linux & iOS platforms. 此处阅读该文档。Read the documentation here. 有关示例代码,请参阅下面的“示例”部分。For sample code, see the Samples section below.
  • Java:添加了带翻译支持的多设备对话。Java: Added multi-device conversation with translation support. 此处参阅参考文档。See the reference doc here.

改进与优化Improvements & Optimizations

  • JavaScript:优化了浏览器麦克风实现,改善了语音识别的准确性。JavaScript: Optimized browser microphone implementation improving speech recognition accuracy.
  • Java:使用直接 JNI 实现(没有 SWIG)重构了绑定。Java: Refactored bindings using direct JNI implementation without SWIG. 这对于 Windows、Android、Linux 和 Mac 的所有 Java 包而言减少了 10 倍的绑定大小,并简化了语音 SDK Java 实现的进一步开发。This reduces by 10x the bindings size for all Java packages used for Windows, Android, Linux and Mac and eases further development of the Speech SDK Java implementation.
  • Linux:使用最新的 RHEL 7 特定说明更新了支持文档Linux: Updated support documentation with the latest RHEL 7 specific notes.
  • 改善了连接逻辑,以便在出现服务和网络错误时多次尝试连接。Improved connection logic to attempt connecting multiple times in case of service and network errors.
  • 更新了 portal.azure.cn 语音快速入门页面,帮助开发人员在 Azure 语音旅程中进行下一步。Updated the portal.azure.cn Speech Quickstart page to help developers take the next step in the Azure Speech journey.

Bug 修复Bug fixes

  • C#、Java:修复了 Linux ARM(32 位和 64 位)上加载 SDK 库时出现的问题C#, Java: Fixed an issue with loading SDK libraries on Linux ARM (both 32 and 64 bit).
  • C# :修复了 TranslationRecognizer、IntentRecognizer 和 Connection 对象的本机句柄的显式处理。C#: Fixed explicit disposal of native handles for TranslationRecognizer, IntentRecognizer and Connection objects.
  • C# :修复了 ConversationTranscriber 对象的音频输入生存期管理。C#: Fixed audio input lifetime management for ConversationTranscriber object.
  • 修复了从简单短语识别意图时 IntentRecognizer 结果原因未正确设置的问题。Fixed an issue where IntentRecognizer result reason was not set properly when recognizing intents from simple phrases.
  • 修复了未正确设置 SpeechRecognitionEventArgs 结果偏移量的问题。Fixed an issue where SpeechRecognitionEventArgs result offset was not set correctly.
  • 修复了在打开 websocket 连接前 SDK 尝试发送网络消息的争用条件。Fixed a race condition where SDK was trying to send a network message before opening the websocket connection. 添加参与者时,针对 TranslationRecognizer 可重现。Was reproducible for TranslationRecognizer while adding participants.
  • 修复了关键字识别器引擎中的内存泄漏。Fixed memory leaks in the keyword recognizer engine.

示例Samples

COVID-19 缩减测试: 由于过去几周一直在远程工作,我们无法像往常那样执行那么多手动验证测试。COVID-19 abridged testing: Due to working remotely over the last few weeks, we couldn't do as much manual verification testing as we normally do. 我们没有做我们认为可能会造成任何破坏的任何更改,我们的自动化测试已全部通过。We haven't made any changes we think could have broken anything, and our automated tests all passed. 如果我们遗漏了某些内容,请在 GitHub 上告诉我们。In the unlikely event that we missed something, please let us know on GitHub.
请保重身体!Stay healthy!

语音 SDK 1.11.0:2020 年 3 月版Speech SDK 1.11.0: 2020-March release

新功能New features

  • Linux:增加了对 Red Hat Enterprise Linux (RHEL)/CentOS 7 x64 的支持,并提供了有关如何针对语音 SDK 配置系统的说明Linux: Added support for Red Hat Enterprise Linux (RHEL)/CentOS 7 x64 with instructions on how to configure the system for Speech SDK.
  • Linux:在 Linux ARM32 和 ARM64 上增加了对 .NET Core C# 的支持。Linux: Added support for .NET Core C# on Linux ARM32 and ARM64. 此处了解详细信息。Read more here.
  • C#、C++:在 ConversationTranscriptionResult 中添加了 UtteranceId,这是在所有中间产物和最终的语音识别结果中保持一致的一个 ID。C#, C++: Added UtteranceId in ConversationTranscriptionResult, a consistent ID across all the intermediates and final speech recognition result. 请参阅适用于 C#C++ 的详细信息。Details for C#, C++.
  • Python:增加了对 Language ID 的支持。Python: Added support for Language ID. 请参阅 GitHub 存储库中的 speech_sample.py。Please see speech_sample.py in GitHub repo.
  • Windows:在 Windows 平台上为所有 win32 控制台应用程序增加了对压缩的音频输入格式的支持。Windows: Added compressed audio input format support on Windows platform for all the win32 console applications. 有关详细信息,请参阅此文Details here.
  • JavaScript:在 NodeJS 中支持语音合成(文本转语音)。JavaScript: Support speech synthesis (text-to-speech) in NodeJS. 此处了解更多信息。Learn more here.
  • JavaScript:添加了新的 API,用于检查发送和接收的所有消息。JavaScript: Add new API's to enable inspection of all send and received messages. 此处了解更多信息。Learn more here.

Bug 修复Bug fixes

  • C#、C++:修复了一个问题,因此 SendMessageAsync 现在以二进制类型发送二进制消息。C#, C++: Fixed an issue so SendMessageAsync now sends binary message as binary type. 请参阅适用于 C#C++ 的详细信息。Details for C#, C++.
  • C#、C++:修复了当使用 Connection MessageReceived 事件时在 Connection 对象之前释放 Recognizer 可能会导致故障的问题。C#, C++: Fixed an issue where using Connection MessageReceived event may cause crash if Recognizer is disposed before Connection object. 请参阅适用于 C#C++ 的详细信息。Details for C#, C++.
  • Android:麦克风的音频缓冲区大小从 800 毫秒减小到 100 毫秒,降低了延迟。Android: Audio buffer size from microphone decreased from 800ms to 100ms to improve latency.
  • Android:修复了 Android Studio 中 x86 Android 模拟器的一个问题Android: Fixed an issue with x86 Android emulator in Android Studio.
  • JavaScript:在 fromSubscription API 中增加了对中国的区域的支持。JavaScript: Added support for Regions in China with the fromSubscription API. 有关详细信息,请参阅此文Details here.
  • JavaScript:针对 NodeJS 中的连接失败添加了更多错误信息。JavaScript: Add more error information for connection failures from NodeJS.

示例Samples

  • Unity:修复了意向识别公共示例(其中的 LUIS json 导入失败)。Unity: Intent recognition public sample is fixed, where LUIS json import was failing. 有关详细信息,请参阅此文Details here.
  • Python:为 Language ID 添加了示例。Python: Sample added for Language ID. 有关详细信息,请参阅此文Details here.

Covid19 缩减测试: 由于过去几周一直在远程工作,我们无法像往常那样执行那么多手动的设备验证测试。Covid19 abridged testing: Due to working remotely over the last few weeks, we couldn't do as much manual device verification testing as we normally do. 例如,在 Linux、iOS 和 macOS 上测试麦克风输入和扬声器输出。An example of this is testing microphone input and speaker output on Linux, iOS, and macOS. 我们没有做我们认为可能会破坏这些平台上的任何东西的任何更改,我们的自动化测试已全部通过。We haven't made any changes we think could have broken anything on these platforms, and our automated tests all passed. 如果我们遗漏了某些内容,请在 GitHub 上告诉我们。In the unlikely event that we missed something, please let us know on GitHub.
感谢你长久以来的支持。Thank you for your continued support. 与往常一样,请在 GitHubStack Overflow 上发布问题或反馈。As always, please post questions or feedback on GitHub or Stack Overflow.
请保重身体!Stay healthy!

语音 SDK 1.10.0:2020 年 2 月版Speech SDK 1.10.0: 2020-February release

新功能New features

  • 添加了 Python 包以支持新的 3.8 版 Python。Added Python packages to support the new 3.8 release of Python.
  • Red Hat Enterprise Linux (RHEL)/CentOS 8 x64 支持(C++、C#、Java、Python)。Red Hat Enterprise Linux (RHEL)/CentOS 8 x64 support (C++, C#, Java, Python).

    备注

    客户必须根据这些说明配置 OpenSSL。Customers must configure OpenSSL according to these instructions.

  • 针对 Debian 和 Ubuntu 的 Linux ARM32 支持。Linux ARM32 support for Debian and Ubuntu.
  • TTS 现在使用订阅密钥进行身份验证,降低了创建合成器后第一个合成结果的第一个字节延迟。TTS now uses subscription key for authentication, reducing the first byte latency of the first synthesis result after creating a synthesizer.
  • 更新了 19 个区域设置的语音识别模型,平均单词错误率降低了 18.6%(es-ES、es-MX、fr-CA、fr-FR、it-IT、ja-JP、ko-KR、pt-BR、zh-CN、zh-HK、nb-NO、fi-FL、ru-RU、pl-PL、ca-ES、zh-TW、th-TH、pt-PT、tr-TR)。Updated speech recognition models for 19 locales for an average word error rate reduction of 18.6% (es-ES, es-MX, fr-CA, fr-FR, it-IT, ja-JP, ko-KR, pt-BR, zh-CN, zh-HK, nb-NO, fi-FL, ru-RU, pl-PL, ca-ES, zh-TW, th-TH, pt-PT, tr-TR). 新模型在多个领域提供了重大改进,其中包括听写、呼叫中心脚本和视频索引方案。The new models bring significant improvements across multiple domains including Dictation, Call-Center Transcription and Video Indexing scenarios.

Bug 修复Bug fixes

  • 修复了在 JAVA API 中聊天听录器未正确等待的 BugFixed bug where Conversation Transcriber did not await properly in JAVA APIs
  • Android x86 仿真器修复(针对 Xamarin GitHub 问题Android x86 emulator fix for Xamarin GitHub issue
  • 为 AudioConfig 添加了缺失的 (Get|Set)Property 方法Add missing (Get|Set)Property methods to AudioConfig
  • 修复了无法在连接失败时停止 audioDataStream 的 TTS BugFix a TTS bug where the audioDataStream could not be stopped when connection fails
  • 使用无区域的终结点会导致聊天翻译器出现 USP 故障Using an endpoint without a region would cause USP failures for conversation translator
  • 现在,在通用 Windows 应用程序中生成 ID 时会使用适当的唯一 GUID 算法;它以前无意中默认为存根实现,这种实现通常会在大型交互集上造成冲突。ID generation in Universal Windows Applications now uses an appropriately unique GUID algorithm; it previously and unintentionally defaulted to a stubbed implementation that often produced collisions over large sets of interactions.

示例Samples

其他更改Other changes

语音 SDK 1.9.0:2020 年 1 月版Speech SDK 1.9.0: 2020-January release

新功能New Features

  • Objective-C:已将 SendMessageSetMessageProperty 方法添加到 Connection 对象。Objective-C: SendMessage and SetMessageProperty methods added to Connection object. 参阅此处的文档。See documentation here.
  • TTS C++ API 现在支持将 std::wstring 用作合成文本输入,这样,在将 wstring 传递给 SDK 之前,无需先将其转换为字符串。TTS C++ api now supports std::wstring as synthesis text input, removing the need to convert a wstring to string before passing it to the SDK. 请参阅此处的详细信息。See details here.
  • C#:现在提供语言 ID源语言配置C#: Language ID and source language config are now available.
  • JavaScript:已将一项功能添加到 Connection 对象,以便从语音服务以回调 receivedServiceMessage 的形式传递自定义消息。JavaScript: Added a feature to Connection object to pass through custom messages from the Speech Service as callback receivedServiceMessage.
  • JavaScript:添加了对 FromHost API 的支持,以便轻松与主权云配合使用。JavaScript: Added support for FromHost API to ease use with sovereign clouds.
  • JavaScript:感谢 orgads 的贡献,我们现在可以采用 NODE_TLS_REJECT_UNAUTHORIZEDJavaScript: We now honor NODE_TLS_REJECT_UNAUTHORIZED thanks to a contribution from orgads. 请参阅此处的详细信息。See details here.

重大更改Breaking changes

  • OpenSSL 已更新到版本 1.1.1b,并静态链接到适用于 Linux 的语音 SDK 核心库。OpenSSL has been updated to version 1.1.1b and is statically linked to the Speech SDK core library for Linux. 如果未在系统上的 /usr/lib/ssl 目录中安装收件箱 OpenSSL,这可能会造成中断。This may cause a break if your inbox OpenSSL has not been installed to the /usr/lib/ssl directory in the system. 请查看语音 SDK 的文档来解决此问题。Please check our documentation under Speech SDK docs to work around the issue.
  • 我们已将为 C# WordLevelTimingResult.Offset 返回的数据类型从 int 更改为 long,以便在语音数据超过 2 分钟时能够访问 WordLevelTimingResultsWe have changed the data type returned for C# WordLevelTimingResult.Offset from int to long to allow for access to WordLevelTimingResults when speech data is longer than 2 minutes.
  • PushAudioInputStreamPullAudioInputStream 现在可以根据 AudioStreamFormat(创建这两个类时选择性地指定)将 wav 标头信息发送到语音服务。PushAudioInputStream and PullAudioInputStream now send wav header information to the Speech Service based on AudioStreamFormat, optionally specified when they were created. 现在,客户必须使用支持的音频输入格式Customers must now use the supported audio input format. 任何其他格式会导致识别结果欠佳,或者导致出现其他问题。Any other formats will get sub-optimal recognition results or may cause other issues.

Bug 修复Bug fixes

  • 请参阅上述“中断性变更”中的 OpenSSL 更新。See the OpenSSL update under Breaking changes above. 修复了 Linux 和 Java 中的间歇性崩溃和性能问题(负载较高时发生锁争用)。We fixed both an intermittent crash and a performance issue (lock contention under high load) in Linux and Java.
  • Java:改进了高并发方案中的对象封闭。Java: Made improvements to object closure in high concurrency scenarios.
  • 重构了我们的 NuGet 包。Restructured our NuGet package. 我们删除了 lib 文件夹下 Microsoft.CognitiveServices.Speech.core.dllMicrosoft.CognitiveServices.Speech.extension.kws.dll 的三个副本,使 NuGet 包更小、下载更快,并添加了编译某些 C++ 本机应用所需的标头。We removed the three copies of Microsoft.CognitiveServices.Speech.core.dll and Microsoft.CognitiveServices.Speech.extension.kws.dll under lib folders, making the NuGet package smaller and faster to download, and we added headers needed to compile some C++ native apps.
  • 修复了此处的快速入门示例。Fixed quickstart samples here. 修复的问题是在 Linux、macOS、Windows 上退出但不显示“未找到麦克风”异常。These were exiting without displaying "microphone not found" exception on Linux, macOS, Windows.
  • 修复了 Azure Web 应用环境中的 SDK 部署错误,并解决了此客户问题Fixed SDK deployment error in Azure Web App environment to address this customer issue.
  • 修复了在使用多 <voice> 标记或 <audio> 标记时出现的 TTS 错误以解决此客户问题Fixed a TTS error while using multi <voice> tag or <audio> tag to address this customer issue.
  • 修复了从挂起状态恢复 SDK 时出现的 TTS 401 错误。Fixed a TTS 401 error when the SDK is recovered from suspended.
  • JavaScript:感谢 euirim 的贡献,修复了音频数据的循环导入。JavaScript: Fixed a circular import of audio data thanks to a contribution from euirim.
  • JavaScript:添加了设置服务属性的支持(版本 1.7 中已添加此项支持)。JavaScript: added support for setting service properties, as added in 1.7.
  • JavaScript:修复了以下问题:连接错误可能导致 websocket 重新连接尝试连续失败。JavaScript: fixed an issue where a connection error could result in continuous, unsuccessful websocket reconnect attempts.

示例Samples

  • 此处的服务器方案添加了 TTS 示例。Added TTS sample for the server scenario here.

其他更改Other changes

  • 优化了 Android 上的 SDK 核心库大小。Optimized SDK core library size on Android.
  • 1.9.0 及更高版本中的 SDK 支持聊天听录器的语音签名版本字段中的 intstring 类型。SDK in 1.9.0 and onwards supports both int and string types in the voice signature version field for Conversation Transcriber.

语音 SDK 1.8.0:2019-November 版本Speech SDK 1.8.0: 2019-November release

新功能New Features

  • 添加了 FromHost() API,以便轻松与主权云配合使用。Added a FromHost() API, to ease use with sovereign clouds.
  • 为语音识别添加了自动源语言检测功能(在 Java 和 C++中)Added Automatic Source Language Detection for Speech Recognition (in Java and C++)
  • 为语音识别添加了 SourceLanguageConfig 对象,用于指定所需的源语言(在 Java 和 C++ 中)Added SourceLanguageConfig object for Speech Recognition, used to specify expected source languages (in Java and C++)
  • 通过 NuGet 和 Unity 包在 Windows (UWP)、Android 和 iOS 上添加了 KeywordRecognizer 支持Added KeywordRecognizer support on Windows (UWP), Android and iOS through the NuGet and Unity packages

重大更改Breaking changes

  • 对话听录器功能已移到 Microsoft.CognitiveServices.Speech.Transcription 命名空间下。Conversation Transcriber functionalities moved under namespace Microsoft.CognitiveServices.Speech.Transcription.
  • 部分对话听录器方法已移到新的 Conversation 类。Part of the Conversation Transcriber methods are moved to new Conversation class.
  • 放弃了对 32 位(ARMv7 和 x86)iOS 的支持Dropped support for 32-bit (ARMv7 and x86) iOS

Bug 修复Bug fixes

  • 针对以下问题进行了修复:如果在不使用有效语音服务订阅密钥的情况下使用本地 KeywordRecognizer,则会发生故障Fix for crash if local KeywordRecognizer is used without a valid Speech service subscription key

示例Samples

  • KeywordRecognizer 的 Xamarin 示例Xamarin sample for KeywordRecognizer
  • KeywordRecognizer 的 Unity 示例Unity sample for KeywordRecognizer
  • 用于自动源语言检测的 C++ 和 Java 示例。C++ and Java samples for Automatic Source Language Detection.

语音 SDK 1.7.0:2019-September 版本Speech SDK 1.7.0: 2019-September release

新功能New Features

  • 添加了对通用 Windows 平台 (UWP)、Android 和 iOS 上的 Xamarin 的支持Added beta support for Xamarin on Universal Windows Platform (UWP), Android, and iOS
  • 添加了对 Unity 的 iOS 支持Added iOS support for Unity
  • 添加了对 Android、iOS 和 Linux 上的 ALaw、Mulaw、FLAC 的 Compressed 输入支持Added Compressed input support for ALaw, Mulaw, FLAC on Android, iOS and Linux
  • Connection 类中添加了 SendMessageAsync,用于向服务发送消息Added SendMessageAsync in Connection class for sending a message to service
  • 在用于设置消息属性 Connection 类中添加了 SetMessagePropertyAdded SetMessageProperty in Connection class for setting property of a message
  • TTS 为 Java(JRE 和 Android)、Python、Swift 和 Objective-C 添加了绑定TTS added bindings for Java (JRE and Android), Python, Swift, and Objective-C
  • TTS 添加了对 macOS、iOS 和 Android 的播放支持。TTS added playback support for macOS, iOS, and Android.
  • 为 TTS 添加了“字边界”信息。Added "word boundary" information for TTS.

Bug 修复Bug fixes

  • 修复了 Unity 2019 for Android 上的 IL2CPP 生成问题Fixed IL2CPP build issue on Unity 2019 for Android
  • 修复了 wav 文件输入中格式错误的标头被错误处理的问题Fixed issue with malformed headers in wav file input being processed incorrectly
  • 修复了 UUID 在某些连接属性中不唯一的问题Fixed issue with UUIDs not being unique in some connection properties
  • 修复了一些有关 Swift 绑定中存在为 Null 性说明符的警告(可能需要小的代码更改)Fixed a few warnings about nullability specifiers in the Swift bindings (might require small code changes)
  • 修复了一个 Bug,该 Bug 导致 websocket 连接在网络负载下被意外关闭Fixed a bug that caused websocket connections to be closed ungracefully under network load
  • 修复了 Android 上的一个问题,该问题有时候导致 DialogServiceConnector 使用的印象 ID 重复Fixed an issue on Android that sometimes results in duplicate impression IDs used by DialogServiceConnector
  • 改进了进行多轮交互时连接的稳定性,以及它们发生在 DialogServiceConnector 上时(通过 Canceled 事件)对故障进行的报告Improvements to the stability of connections across multi-turn interactions and the reporting of failures (via Canceled events) when they occur with DialogServiceConnector
  • 现在,DialogServiceConnector 会话开始时会正确提供事件,包括在活动 StartKeywordRecognitionAsync() 期间调用 ListenOnceAsync() 的时候DialogServiceConnector session starts will now properly provide events, including when calling ListenOnceAsync() during an active StartKeywordRecognitionAsync()
  • 解决了与收到的 DialogServiceConnector 活动相关联的崩溃Addressed a crash associated with DialogServiceConnector activities being received

示例Samples

  • Xamarin 的快速入门Quickstart for Xamarin
  • 使用 Linux ARM64 信息更新了 CPP 快速入门Updated CPP Quickstart with Linux ARM64 information
  • 使用 iOS 信息更新了 Unity 快速入门Updated Unity quickstart with iOS information

语音 SDK 1.6.0:2019 年 6 月发布Speech SDK 1.6.0: 2019-June release

示例Samples

  • UWP 和 Unity 上的文本转语音快速入门示例Quickstart samples for Text To Speech on UWP and Unity
  • iOS 上的 Swift 快速入门示例Quickstart sample for Swift on iOS
  • 语音和意向识别及翻译 Unity 示例Unity samples for Speech & Intent Recognition and Translation
  • DialogServiceConnector 的更新的快速入门示例Updated quickstart samples for DialogServiceConnector

改进 / 更改Improvements / Changes

  • 对话命名空间:Dialog namespace:
    • SpeechBotConnector 已重名为 DialogServiceConnectorSpeechBotConnector has been renamed to DialogServiceConnector
    • BotConfig 已重名为 DialogServiceConfigBotConfig has been renamed to DialogServiceConfig
    • BotConfig::FromChannelSecret() 已重新映射到 DialogServiceConfig::FromBotSecret()BotConfig::FromChannelSecret() has been remapped to DialogServiceConfig::FromBotSecret()
  • 更新了 TTS REST 适配器以支持代理和持久连接Update TTS REST adapter to support proxy, persistent connection
  • 改写了传递无效区域时出现的错误消息Improve error message when an invalid region is passed
  • Swift/Objective-C:Swift/Objective-C:
    • 改进了错误报告:可能导致出错的方法现在有两个版本:一个版本公开用于错误处理的 NSError 对象,另一个版本引发异常。Improved error reporting: Methods that can result in an error are now present in two versions: One that exposes an NSError object for error handling, and one that raises an exception. 前者向 Swift 公开。The former are exposed to Swift. 此项更改需要对现有的 Swift 代码进行改编。This change requires adaptations to existing Swift code.
    • 改进了事件处理Improved event handling

Bug 修复Bug fixes

  • 针对 TTS 进行了以下问题的修复:SpeakTextAsync 不等到音频完成渲染就会提前返回Fix for TTS: where SpeakTextAsync future returned without waiting until audio has completed rendering
  • 修复了 C# 中的封送字符串,以支持完整语言Fix for marshaling strings in C# to enable full language support
  • 修复了示例中的 .NET Core 应用问题,以使用 net461 目标框架加载核心库Fix for .NET core app problem to load core library with net461 target framework in samples
  • 修复了示例中的偶发性问题,以将本机库部署到输出文件夹Fix for occasional issues to deploy native libraries to the output folder in samples
  • 修复了 Web 套接字可靠关闭的问题Fix for web socket closing reliably
  • 修复了在 Linux 负载极高的情况下打开连接时可能发生崩溃的问题Fix for possible crash while opening a connection under very heavy load on Linux
  • 修复了 macOS 框架捆绑包中缺少元数据的问题Fix for missing metadata in the framework bundle for macOS
  • 修复了 Windows 上的 pip install --user 问题Fix for problems with pip install --user on Windows

语音 SDK 1.5.0:2019 年 5 月发布Speech SDK 1.5.0: 2019-May release

新功能New features

  • 短语提示功能通过 SDK 提供。Phrase hint functionality is available through the SDK. 有关详细信息,请参阅此文For more information, see here.

示例Samples

  • 添加了 SDK 支持的新功能或新服务的示例。Added samples for new features or new services supported by the SDK.

改进 / 更改Improvements / Changes

  • 添加了各种识别器属性,以调整服务行为或服务结果(例如屏蔽猥亵内容等)。Added various recognizer properties to adjust service behavior or service results (like masking profanity and others).
  • 现在,即使你创建了识别器 FromEndpoint,也能通过标准配置属性来配置识别器。You can now configure the recognizer through the standard configuration properties, even if you created the recognizer FromEndpoint.
  • Objective-C:已将 OutputFormat 属性添加到 SPXSpeechConfigurationObjective-C: OutputFormat property was added to SPXSpeechConfiguration.
  • SDK 现在支持将 Debian 9 用作 Linux 分发版。The SDK now supports Debian 9 as a Linux distribution.

Bug 修复Bug fixes

  • 修复了文本转语音中过早销毁讲述人资源的问题。Fixed a problem where the speaker resource was destructed too early in text-to-speech.

语音 SDK 1.4.2Speech SDK 1.4.2

这是一个 Bug 修复版本,只影响本机/托管 SDK。This is a bug fix release and only affecting the native/managed SDK. 它不影响 SDK 的 JavaScript 版本。It is not affecting the JavaScript version of the SDK.

语音 SDK 1.4.1Speech SDK 1.4.1

这是一个仅限 JavaScript 的版本。This is a JavaScript-only release. 未增加任何功能。No features have been added. 进行了以下修复:The following fixes were made:

  • 阻止 Web 包加载 https-proxy-agent。Prevent web pack from loading https-proxy-agent.

语音 SDK 1.4.0:2019 年 4 月发布Speech SDK 1.4.0: 2019-April release

新功能New features

  • SDK 现在支持 beta 版本的文本转语音服务。The SDK now supports the text-to-speech service as a beta version. Windows 和 Linux 桌面版中的 C++ 和 C# 支持该版本。It is supported on Windows and Linux Desktop from C++ and C#. 有关详细信息,请查看文本转语音概述For more information, check the text-to-speech overview.
  • SDK 现在支持将 MP3 和 Opus/OGG 音频文件用作流输入文件。The SDK now supports MP3 and Opus/OGG audio files as stream input files. 此功能只能通过 C++ 和 C# 在 Linux 上使用,目前为 beta 版(更多详细信息请参见此处)。This feature is available only on Linux from C++ and C# and is currently in beta (more details here).
  • 适用于 Java、.NET Core C++和 Objective-C 的语音 SDK 已获得 macOS 支持。The Speech SDK for Java, .NET core, C++ and Objective-C have gained macOS support. macOS 的 Objective-C 支持目前以 beta 版提供。The Objective-C support for macOS is currently in beta.
  • iOS:适用于 iOS (Objective-C) 的语音 SDK 现在也已作为 CocoaPod 发布。iOS: The Speech SDK for iOS (Objective-C) is now also published as a CocoaPod.
  • JavaScript:支持将非默认麦克风用作输入设备。JavaScript: Support for non-default microphone as an input device.
  • JavaScript:Node.js 的代理支持。JavaScript: Proxy support for Node.js.

示例Samples

  • 添加了有关在 macOS 上的 C++ 和 Objective-C 中使用语音 SDK 的示例。Samples for using the Speech SDK with C++ and with Objective-C on macOS have been added.
  • 已添加用于演示文本转语音服务用法的示例。Samples demonstrating the usage of the text-to-speech service have been added.

改进 / 更改Improvements / Changes

  • Python:现在会通过 properties 属性公开识别结果的附加属性。Python: Additional properties of recognition results are now exposed via the properties property.
  • 若要获得更多开发和调试支持,可将 SDK 日志记录和诊断信息重定向到日志文件中(更多详细信息请参见此处)。For additional development and debug support, you can redirect SDK logging and diagnostics information into a log file (more details here).
  • JavaScript:提高了音频处理性能。JavaScript: Improve audio processing performance.

Bug 修复Bug fixes

  • Mac/iOS:修复了未能与语音服务建立连接时导致长时间等待的 bug。Mac/iOS: A bug that led to a long wait when a connection to the Speech service could not be established was fixed.
  • Python:改进了 Python 回调中的参数的错误处理。Python: improve error handling for arguments in Python callbacks.
  • JavaScript:修复了 RequestSession 中结束的语音的错误状态报告。JavaScript: Fixed wrong state reporting for speech ended on RequestSession.

语音 SDK 1.3.1:2019 年 2 月刷新Speech SDK 1.3.1: 2019-February refresh

这是一个 Bug 修复版本,只影响本机/托管 SDK。This is a bug fix release and only affecting the native/managed SDK. 它不影响 SDK 的 JavaScript 版本。It is not affecting the JavaScript version of the SDK.

Bug 修复Bug fix

  • 修复了使用麦克风输入时出现的内存泄漏问题。Fixed a memory leak when using microphone input. 基于流的输入或文件输入不受影响。Stream based or file input is not affected.

语音 SDK 1.3.0:2019 年 2 月版本Speech SDK 1.3.0: 2019-February release

新功能New Features

  • 语音 SDK 支持通过 AudioConfig 类来选择输入麦克风。The Speech SDK supports selection of the input microphone through the AudioConfig class. 这样,便可以将音频数据从非默认麦克风流式传输到语音服务。This allows you to stream audio data to the Speech service from a non-default microphone. 有关详细信息,请参阅介绍音频输入设备选择的文档。For more information, see the documentation describing audio input device selection. 此功能在 JavaScript 中尚不可用。This feature is not yet available from JavaScript.
  • 语音 SDK 目前在 beta 版本中支持 Unity。The Speech SDK now supports Unity in a beta version. 请通过 GitHub 示例存储库中的问题部分来提供反馈。Provide feedback through the issue section in the GitHub sample repository. 此版本支持在 Windows x86 和 x64(桌面或通用 Windows 平台应用程序)以及 Android(ARM32/64,x86)上使用 Unity。This release supports Unity on Windows x86 and x64 (desktop or Universal Windows Platform applications), and Android (ARM32/64, x86). Unity 快速入门中提供了更多信息。More information is available in our Unity quickstart.
  • 不再需要 Microsoft.CognitiveServices.Speech.csharp.bindings.dll 文件(在以前的版本中提供)。The file Microsoft.CognitiveServices.Speech.csharp.bindings.dll (shipped in previous releases) isn't needed anymore. 此功能现在集成到核心 SDK 中。The functionality is now integrated into the core SDK.

示例Samples

示例存储库中提供了以下新内容:The following new content is available in our sample repository:

  • AudioConfig.FromMicrophoneInput 的其他示例。Additional samples for AudioConfig.FromMicrophoneInput.
  • 有关意向识别和翻译的更多 Python 示例。Additional Python samples for intent recognition and translation.
  • 有关在 iOS 中使用 Connection 对象的更多示例。Additional samples for using the Connection object in iOS.
  • 有关具有音频输出的翻译的更多 Java 示例。Additional Java samples for translation with audio output.
  • 有关使用批量听录 REST API 的新示例。New sample for use of the Batch Transcription REST API.

改进 / 更改Improvements / Changes

  • PythonPython
    • 改进了 SpeechConfig 中的参数验证和错误消息。Improved parameter verification and error messages in SpeechConfig.
    • 添加 Connection 对象的支持。Add support for the Connection object.
    • 支持 Windows 上的 32 位 Python (x86)。Support for 32-bit Python (x86) on Windows.
    • 适用于 Python 的语音 SDK 已完成 beta 版本。The Speech SDK for Python is out of beta.
  • iOSiOS
    • SDK 现在是基于 iOS SDK 版本 12.1 构建的。The SDK is now built against the iOS SDK version 12.1.
    • SDK 现在支持 iOS 版本 9.2 及更高版本。The SDK now supports iOS versions 9.2 and later.
    • 改进了参考文档并修复了多个属性名称。Improve reference documentation and fix several property names.
  • JavascriptJavaScript
    • 添加 Connection 对象的支持。Add support for the Connection object.
    • 添加了捆绑的 JavaScript 的类型定义文件Add type definition files for bundled JavaScript
    • 首次支持并实现了短语提示。Initial support and implementation for phrase hints.
    • 随服务 JSON 返回属性集合以用于识别Return properties collection with service JSON for recognition
  • Windows DLL 现在包含一个版本资源。Windows DLLs do now contain a version resource.
  • 如果创建识别器 FromEndpoint,则可将参数直接添加到终结点 URL。If you create a recognizer FromEndpoint you can add parameters directly to the endpoint URL. 使用 FromEndpoint 时,无法通过标准的配置属性来配置识别器。Using FromEndpoint you can't configure the recognizer through the standard configuration properties.

Bug 修复Bug fixes

  • 过去无法正确处理空的代理用户名和代理密码。Empty proxy username and proxy password were not handled correctly. 在此版本中,如果将代理用户名和代理密码设置为空字符串,则在连接到代理时不会提交它们。With this release, if you set proxy username and proxy password to an empty string, they will not be submitted when connecting to the proxy.
  • 对于某些语言 / 环境,由 SDK 创建的 SessionId 并非总是真正随机的。SessionId's created by the SDK were not always truly random for some languages / environments. 已添加了随机生成器初始化来修复此问题。Added random generator initialization to fix this issue.
  • 改进了对授权令牌的处理。Improve handling of authorization token. 如果希望使用授权令牌,请在 SpeechConfig 中进行指定并将订阅密钥保留为空。If you want to use an authorization token, specify in the SpeechConfig and leave the subscription key empty. 然后,像往常一样创建识别器。Then create the recognizer as usual.
  • 过去,在某些情况下,Connection 对象不能正确释放。In some cases the Connection object wasn't released correctly. 现在已修复此问题。This issue has been fixed.
  • JavaScript 示例已修复,在 Safari 上也支持用于翻译合成的音频输出。The JavaScript sample was fixed to support audio output for translation synthesis also on Safari.

语音 SDK 1.2.1Speech SDK 1.2.1

这是一个仅限 JavaScript 的版本。This is a JavaScript-only release. 未增加任何功能。No features have been added. 进行了以下修复:The following fixes were made:

  • 在 turn.end 处触发流结束,在 speech.end 处不触发。Fire end of stream at turn.end, not at speech.end.
  • 修复了音频泵中在当前发送失败时不安排下一次发送的 bug。Fix bug in audio pump that did not schedule next send if the current send failed.
  • 修复了使用身份验证令牌进行的连续识别。Fix continuous recognition with auth token.
  • 对不同识别器 / 终结点的 bug 修复。Bug fix for different recognizer / endpoints.
  • 文档改进。Documentation improvements.

语音 SDK 1.2.0:2018 年 12 月版本Speech SDK 1.2.0: 2018-December release

新功能New Features

  • PythonPython
    • 此版本支持 Python 的 Beta 版本(3.5 及更高版本)。The Beta version of Python support (3.5 and above) is available with this release. 有关详细信息,请参阅此文](quickstart-python.md)。For more information, see here](quickstart-python.md).
  • JavascriptJavaScript
    • 适用于 JavaScript 的语音 SDK 已开放了源代码。The Speech SDK for JavaScript has been open-sourced. GitHub 上提供了源代码。The source code is available on GitHub.
    • 我们现在支持 Node.js,可以在此处找到详细信息。We now support Node.js, more info can be found here.
    • 已删除了对音频会话的长度限制,将自动在后台进行重新连接。The length restriction for audio sessions has been removed, reconnection will happen automatically under the cover.
  • (属于Connection 对象)的父级。Connection object
    • 可以从 Recognizer 中访问 Connection 对象。From the Recognizer, you can access a Connection object. 此对象允许你显式启动服务连接并订阅连接事件和断开连接事件。This object allows you to explicitly initiate the service connection and subscribe to connect and disconnect events. (此功能在 JavaScript 和 Python 中尚不可用。)(This feature is not yet available from JavaScript and Python.)
  • 支持 Ubuntu 18.04。Support for Ubuntu 18.04.
  • AndroidAndroid
    • 在生成 APK 期间启用了 ProGuard 支持。Enabled ProGuard support during APK generation.

改进Improvements

  • 改进了内部线程的使用,减少了线程、锁和互斥的数量。Improvements in the internal thread usage, reducing the number of threads, locks, mutexes.
  • 改进了错误报告 / 信息。Improved error reporting / information. 在某些情况下,错误消息没有完全传播出去。In several cases, error messages have not been propagated out all the way out.
  • 更新了 JavaScript 中的开发依赖项来使用最新模块。Updated development dependencies in JavaScript to use up-to-date modules.

Bug 修复Bug fixes

  • 修复了由于 RecognizeAsync 中的类型不匹配导致的内存泄漏。Fixed memory leaks due to a type mismatch in RecognizeAsync.
  • 在某些情况下,异常会被泄露。In some cases exceptions were being leaked.
  • 修复了翻译事件参数中的内存泄漏。Fixing memory leak in translation event arguments.
  • 修复了长时间运行的会话中与重新连接相关的锁定问题。Fixed a locking issue on reconnect in long running sessions.
  • 修复了可能会导致失败的翻译缺少最终结果的问题。Fixed an issue that could lead to missing final result for failed translations.
  • C#:如果在主线程中没有等待 async 操作,则可能会在异步任务完成之前释放识别器。C#: If an async operation wasn't awaited in the main thread, it was possible the recognizer could be disposed before the async task was completed.
  • Java:修复了导致 Java VM 故障的一个问题。Java: Fixed a problem resulting in a crash of the Java VM.
  • Objective-C:修复了枚举映射;之前返回 RecognizedIntent 而非 RecognizingIntentObjective-C: Fixed enum mapping; RecognizedIntent was returned instead of RecognizingIntent.
  • JavaScript:在 SpeechConfig 中将默认输出格式设置为“simple”。JavaScript: Set default output format to 'simple' in SpeechConfig.
  • JavaScript:删除了 JavaScript 和其他语言中配置对象中的属性之间的不一致。JavaScript: Removing inconsistency between properties on the config object in JavaScript and other languages.

示例Samples

  • 更新并修复了几个示例(例如,翻译的输出语音,等等)。Updated and fixed several samples (for example output voices for translation, etc.).
  • 示例存储库中添加了 Node.js 示例。Added Node.js samples in the sample repository.

语音 SDK 1.1.0Speech SDK 1.1.0

新功能New Features

  • 对 Android x86/x64 的支持。Support for Android x86/x64.
  • 代理支持:在 SpeechConfig 对象中,现在可以调用某个函数来设置代理信息(主机名、端口、用户名和密码)。Proxy Support: In the SpeechConfig object, you can now call a function to set the proxy information (hostname, port, username, and password). 此功能在 iOS 上尚不可用。This feature is not yet available on iOS.
  • 改进了错误代码和消息。Improved error code and messages. 如果识别返回了错误,这在过去会将 Reason(在已取消事件中)或 CancellationDetails(在识别结果中)设置为 ErrorIf a recognition returned an error, this did already set Reason (in canceled event) or CancellationDetails (in recognition result) to Error. 取消的事件现在包含两个附加的成员:ErrorCodeErrorDetailsThe canceled event now contains two additional members, ErrorCode and ErrorDetails. 如果服务器随所报告的错误返回了附加的错误信息,则现在将在新成员中提供该信息。If the server returned additional error information with the reported error, it will now be available in the new members.

改进Improvements

  • 在识别器配置中添加了附加的验证并添加了附加的错误消息。Added additional verification in the recognizer configuration, and added additional error message.
  • 改进了对音频文件中间的长时间静默的处理。Improved handling of long-time silence in middle of an audio file.
  • NuGet 包:对于 .NET Framework 项目,它阻止使用 AnyCPU 配置进行构建。NuGet package: for .NET Framework projects, it prevents building with AnyCPU configuration.

Bug 修复Bug fixes

  • 修复了在识别器中发现的几处异常。Fixed several exceptions found in recognizers. 此外,还会捕获异常并将其转换为 Canceled 事件。In addition, exceptions are caught and converted into Canceled event.
  • 修复了属性管理中的内存泄漏。Fix a memory leak in property management.
  • 修复了音频输入文件可能会导致识别器发生故障的 bug。Fixed bug in which an audio input file could crash the recognizer.
  • 修复了在会话停止事件后无法检索事件的 bug。Fixed a bug where events could be received after a session stop event.
  • 修复了线程中的一些争用条件。Fixed some race conditions in threading.
  • 修复了可能会导致故障的 iOS 兼容性问题。Fixed an iOS compatibility issue that could result in a crash.
  • 改进了对 Android 麦克风的支持的稳定性。Stability improvements for Android microphone support.
  • 修复了 JavaScript 中的识别器将忽略识别语言的 bug。Fixed a bug where a recognizer in JavaScript would ignore the recognition language.
  • 修复了阻止在 JavaScript 中设置 EndpointId(在某些情况下)的 bug。Fixed a bug preventing setting the EndpointId (in some cases) in JavaScript.
  • 更改了 JavaScript 中的 AddIntent 中的参数顺序,并添加了缺少的 AddIntent JavaScript 签名。Changed parameter order in AddIntent in JavaScript, and added missing AddIntent JavaScript signature.

示例Samples

  • 示例存储库中添加了拉取和推送流用法的 C++ 和 C# 示例。Added C++ and C# samples for pull and push stream usage in the sample repository.

语音 SDK 1.0.1Speech SDK 1.0.1

可靠性改进和 bug 修复:Reliability improvements and bug fixes:

  • 修复了处理识别器时由于争用条件而导致的潜在严重错误Fixed potential fatal error due to race condition in disposing recognizer
  • 修复了未设置属性情况下的潜在严重错误。Fixed potential fatal error in case of unset properties.
  • 添加了其他错误检查和参数检查。Added additional error and parameter checking.
  • Objective-C:修复了 NSString 中名称替代而引起的潜在严重错误。Objective-C: Fixed possible fatal error caused by name overriding in NSString.
  • Objective-C:调整了 API 的可见性Objective-C: Adjusted visibility of API
  • JavaScript:针对事件及其有效负载进行了修复。JavaScript: Fixed regarding events and their payloads.
  • 文档改进。Documentation improvements.

示例存储库中已添加了适用于 JavaScript 的新示例。In our sample repository, a new sample for JavaScript was added.

认知服务语音 SDK 1.0.0:2018 年 9 月版本Cognitive Services Speech SDK 1.0.0: 2018-September release

新功能New features

重大更改Breaking changes

  • 该版本中推出了大量重大更改。With this release, a number of breaking changes are introduced. 有关详细信息,请查看此页Check this page for details.

认知服务语音 SDK 0.6.0:2018 年 8 月版本Cognitive Services Speech SDK 0.6.0: 2018-August release

新功能New features

  • 使用语音 SDK 生成的 UWP 应用现在可以通过 Windows 应用认证工具包 (WACK)。UWP apps built with the Speech SDK now can pass the Windows App Certification Kit (WACK). 请查看 UWP 快速入门Check out the UWP quickstart.
  • 在 Linux (Ubuntu 16.04 x64) 上支持 .NET Standard 2.0。Support for .NET Standard 2.0 on Linux (Ubuntu 16.04 x64).
  • 试验:在 Windows (64-bit) 和 Linux (Ubuntu 16.04 x64) 上支持 Java 8。Experimental: Support Java 8 on Windows (64-bit) and Linux (Ubuntu 16.04 x64). 请查看 Java 运行时环境快速入门Check out the Java Runtime Environment quickstart.

功能性更改Functional change

  • 公开了关于连接错误的更多错误详细信息。Expose additional error detail information on connection errors.

重大更改Breaking changes

  • 在 Java (Android) 中,SpeechFactory.configureNativePlatformBindingWithDefaultCertificate 函数不再需要 path 参数。On Java (Android), the SpeechFactory.configureNativePlatformBindingWithDefaultCertificate function no longer requires a path parameter. 现在,在所有受支持的平台上都会自动检测路径。Now the path is automatically detected on all supported platforms.
  • 在 Java 和 C# 中,属性 EndpointUrl 的 get 访问器已被删除。The get-accessor of the property EndpointUrl in Java and C# was removed.

Bug 修复Bug fixes

  • 在 Java 中,目前在翻译识别器上实现了音频合成结果。In Java, the audio synthesis result on the translation recognizer is implemented now.
  • 修复了一个 bug,该 bug 可能会导致非活动线程和更多的已打开且未使用的套接字。Fixed a bug that could cause inactive threads and an increased number of open and unused sockets.
  • 修复了一个问题,该问题导致在传输过程中无法终止长时间运行的识别。Fixed a problem, where a long-running recognition could terminate in the middle of the transmission.
  • 修复了识别器关闭过程中的一个争用条件。Fixed a race condition in recognizer shutdown.

认知服务语音 SDK 0.5.0:2018 年 7 月版本Cognitive Services Speech SDK 0.5.0: 2018-July release

新功能New features

  • 支持 Android 平台(API 23:Android 6.0 Marshmallow 或更高版本)。Support Android platform (API 23: Android 6.0 Marshmallow or higher). 查看 Android 快速入门Check out the Android quickstart.
  • 在 Windows 上支持 .NET Standard 2.0。Support .NET Standard 2.0 on Windows. 查看 .NET Core 快速入门Check out the .NET Core quickstart.
  • 试验:在 Windows 上支持 UWP(版本 1709 或更高版本)。Experimental: Support UWP on Windows (version 1709 or later).
    • 请查看 UWP 快速入门Check out the UWP quickstart.
    • 注意:使用语音 SDK 生成的 UWP 应用尚未通过 Windows 应用认证工具包 (WACK)。Note: UWP apps built with the Speech SDK do not yet pass the Windows App Certification Kit (WACK).
  • 通过自动重新连接支持识别功能长时间运行。Support long-running recognition with automatic reconnection.

功能性更改Functional changes

  • StartContinuousRecognitionAsync() 支持识别功能长时间运行。StartContinuousRecognitionAsync() supports long-running recognition.
  • 识别结果包含更多字段。The recognition result contains more fields. 这些字段是识别文本的音频开始和持续时间(时钟周期数)的偏移量和表示识别状态的其他值(例如 InitialSilenceTimeoutInitialBabbleTimeout)。They're offset from the audio beginning and duration (both in ticks) of the recognized text and additional values that represent recognition status, for example, InitialSilenceTimeout and InitialBabbleTimeout.
  • 支持 AuthorizationToken 用于创建工厂实例。Support AuthorizationToken for creating factory instances.

重大更改Breaking changes

  • 识别事件:NoMatch 事件类型已合并到 Error 事件中。Recognition events: NoMatch event type was merged into the Error event.
  • C# 中的 SpeechOutputFormat 已重命名为 OutputFormat 以与 C++ 保持一致。SpeechOutputFormat in C# was renamed to OutputFormat to stay aligned with C++.
  • AudioInputStream 接口的某些方法的返回类型略有更改:The return type of some methods of the AudioInputStream interface changed slightly:
    • 在 Java 中,read 方法现返回 long 而不是 intIn Java, the read method now returns long instead of int.
    • 在 C# 中,Read 方法现返回 uint 而不是 intIn C#, the Read method now returns uint instead of int.
    • 在 C++ 中,ReadGetFormat 方法现返回 size_t 而不是 intIn C++, the Read and GetFormat methods now return size_t instead of int.
  • C++:音频输入流的实例现在只能作为 shared_ptr 传递。C++: Instances of audio input streams now can be passed only as a shared_ptr.

Bug 修复Bug fixes

  • 修复了 RecognizeAsync() 超时时结果中的错误返回值。Fixed incorrect return values in the result when RecognizeAsync() times out.
  • 删除了 Windows 上媒体基础库的依赖项。The dependency on media foundation libraries on Windows was removed. SDK 现在使用 Core Audio API。The SDK now uses Core Audio APIs.
  • 文档修复:添加了一个区域页来描述支持的区域。Documentation fix: Added a regions page to describe the supported regions.

已知问题Known issue

  • 适用于 Android 的语音 SDK 不报告用于翻译的语音合成结果。The Speech SDK for Android doesn't report speech synthesis results for translation. 此问题将在下一版本中修复。This issue will be fixed in the next release.

认知服务语音 SDK 0.4.0:2018 年 6 月版本Cognitive Services Speech SDK 0.4.0: 2018-June release

功能性更改Functional changes

  • AudioInputStreamAudioInputStream

    一种现可将流用作音频源的识别器。A recognizer now can consume a stream as the audio source. 有关详细信息,请参阅相关操作说明指南For more information, see the related how-to guide.

  • 详细输出格式Detailed output format

    创建 SpeechRecognizer 时,可请求 DetailedSimple 输出格式。When you create a SpeechRecognizer, you can request Detailed or Simple output format. DetailedSpeechRecognitionResult 包含置信度分数、识别的文本、原始词法形式、标准化形式和已屏蔽不当字词的标准化形式。The DetailedSpeechRecognitionResult contains a confidence score, recognized text, raw lexical form, normalized form, and normalized form with masked profanity.

重大更改Breaking change

  • 将 C# 中的 SpeechRecognitionResult.RecognizedText 更改为 SpeechRecognitionResult.TextChanged to SpeechRecognitionResult.Text from SpeechRecognitionResult.RecognizedText in C#.

Bug 修复Bug fixes

  • 修复了关闭期间 USP 层中可能出现的回叫问题。Fixed a possible callback issue in the USP layer during shutdown.
  • 如果识别器使用了音频输入文件,则它在文件句柄上停留的时间将超过必要时间。If a recognizer consumed an audio input file, it was holding on to the file handle longer than necessary.
  • 删除了消息泵和识别器之间的多个死锁。Removed several deadlocks between the message pump and the recognizer.
  • 在服务的响应超时后触发 NoMatch 结果。Fire a NoMatch result when the response from service is timed out.
  • Windows 上的媒体基础库为延迟加载。The media foundation libraries on Windows are delay loaded. 此库仅用于麦克风输入。This library is required for microphone input only.
  • 音频数据的上传速度约限制为原始音频速度的两倍。The upload speed for audio data is limited to about twice the original audio speed.
  • 在 Windows 上,C# .NET 程序集现在为强命名。On Windows, C# .NET assemblies now are strong named.
  • 文档修复:Region 是创建识别器所必需的信息。Documentation fix: Region is required information to create a recognizer.

已添加更多示例,还将持续更新。More samples have been added and are constantly being updated. 有关最新的示例集,请参阅语音 SDK 示例 GitHub 存储库For the latest set of samples, see the Speech SDK samples GitHub repository.

认知服务语音 0.2.12733:2018 年 5 月版本Cognitive Services Speech SDK 0.2.12733: 2018-May release

此版本是认知服务语音 SDK 的第一个公共预览版本。This release is the first public preview release of the Cognitive Services Speech SDK.