什么是自定义语音识别?What is Custom Speech?

自定义语音识别是一组基于 UI 的工具,可用于针对你的应用程序和产品,评估并改进 Microsoft 语音转文本准确度。Custom Speech is a set of UI-based tools that allow you to evaluate and improve the Microsoft speech-to-text accuracy for your applications and products. 只需准备几个测试性音频文件即可开始。All it takes to get started is a handful of test audio files. 请遵循本文中的链接开始创建自定义语音转文本体验。Follow the links in this article to start creating a custom speech-to-text experience.

什么是自定义语音识别?What's in Custom Speech?

在开始使用自定义语音识别执行任何操作之前,需要一个 Azure 帐户和一个语音服务订阅。Before you can do anything with Custom Speech, you'll need an Azure account and a Speech service subscription. 有了帐户后,即可准备数据、训练和测试模型、查看识别质量、评估准确度,并最终部署和使用自定义语音转文本模型。After you have an account, you can prep your data, train and test your models, inspect recognition quality, evaluate accuracy, and ultimately deploy and use the custom speech-to-text model.

此图突出显示了自定义语音识别门户的组件。This diagram highlights the pieces that make up the Custom Speech portal. 使用以下链接详细了解每个步骤。Use the links below to learn more about each step.


  1. 订阅和创建项目 - 创建 Azure 帐户并订阅语音服务。Subscribe and create a project - Create an Azure account and subscribe to the Speech service. 使用此统一订阅可以访问语音转文本、文本转语音、语音翻译和自定义语音识别门户This unified subscription gives you access to speech-to-text, text-to-speech, speech translation, and the Custom Speech portal. 然后,使用语音服务订阅创建第一个“自定义语音识别”项目。Then use your Speech service subscription to create your first Custom Speech project.

  2. 上传测试数据Upload test data. 上传测试数据(音频文件),以便针对你的应用程序、工具和产品评估 Microsoft 语音转文本产品/服务。Upload test data (audio files) to evaluate the Microsoft speech-to-text offering for your applications, tools, and products.

  3. 检查识别质量Inspect recognition quality. 使用自定义语音识别门户播放上传的音频,检查测试数据的语音识别质量。Use the Custom Speech portal to play back uploaded audio and inspect the speech recognition quality of your test data. 如需进行量化度量,请参阅检查数据For quantitative measurements, see Inspect data.

  4. 评估和提高准确度Evaluate and improve accuracy. 评估和提高语音转文本模型的准确度。Evaluate and improve the accuracy of the speech-to-text model. 自定义语音识别门户会提供误字率,该指标可以用来确定是否需要更多的训练。The Custom Speech portal will provide a Word Error Rate, which you can use to determine if additional training is required. 如果对准确度满意,可以直接使用语音服务 API。If you're satisfied with the accuracy, you can use the Speech service APIs directly. 如果想要提高准确度 5% - 20%(相对平均值),请在门户中使用“训练”选项卡上传更多的训练数据,例如人为标记的听录和相关的文本。If you want to improve accuracy by a relative average of 5% to 20%, use the Training tab in the portal to upload additional training data, like human-labeled transcripts and related text.

  5. 训练和部署模型Train and deploy a model. 提供编写的脚本(10-1,000 小时)和相关的文本 (<200 MB) 以及音频测试数据,以便提高语音转文本模型的准确度。Improve the accuracy of your speech-to-text model by providing written transcripts (10 to 1,000 hours) and related text (<200 MB) along with your audio test data. 该数据有助于训练语音转文本模型。This data helps to train the speech-to-text model. 训练后,请重新测试。After training, retest. 如果对结果感到满意,则可将模型部署到自定义终结点。If you're satisfied with the result, you can deploy your model to a custom endpoint.

设置 Azure 帐户Set up your Azure account

你需要拥有 Azure 帐户和语音服务订阅,才能使用自定义语音识别门户创建自定义模型。You need to have an Azure account and Speech service subscription before you can use the Custom Speech portal to create a custom model. 如果你没有帐户和订阅,可以免费试用语音服务If you don't have an account and subscription, try the Speech service for free.


请确保创建标准 (S0) 订阅。Please be sure to create a standard (S0) subscription. 不支持免费 (F0) 订阅。Free (F0) subscriptions aren't supported.

创建 Azure 帐户和语音服务订阅后,需要登录到“自定义语音识别”门户并连接订阅。After you create an Azure account and a Speech service subscription, you'll need to sign in to the Custom Speech portal and connect your subscription.

  1. 登录到自定义语音识别门户Sign in to the Custom Speech portal.
  2. 选择需要使用的订阅并创建语音项目。Select the subscription you need to work in and create a speech project.
  3. 如果要修改订阅,请选择顶部菜单中的齿轮按钮。If you want to modify your subscription, select the cog button in the top menu.

如何创建项目How to create a project

数据、模型、测试和终结点等内容在自定义语音识别门户中组织成项目。Content like data, models, tests, and endpoints are organized into projects in the Custom Speech portal. 每个项目特定于域和国家/地区或语言。Each project is specific to a domain and country/language. 例如,可以为使用美式英语的呼叫中心创建一个项目。For example, you might create a project for call centers that use English in the United States.

若要创建第一个项目,请选择“语音转文本/自定义语音识别”,然后选择“新建项目” 。To create your first project, select Speech-to-text/Custom speech, and then select New Project. 遵照向导中的说明创建项目。Follow the instructions provided by the wizard to create your project. 创建项目后,应该看到四个选项卡:“数据”、“测试”、“训练”和“部署”。 After you create a project, you should see four tabs: Data, Testing, Training, and Deployment. 使用后续步骤中提供的链接了解如何使用每个选项卡。Use the links provided in Next steps to learn how to use each tab.


“自定义语音识别”门户最近已更新!The Custom Speech portal was recently updated! 如果以前已在 cris.azure.cn 门户或使用 API 创建了数据、模型、测试并已发布了终结点,则需要在新门户中创建一个新项目以连接到这些旧实体。If you created previous data, models, tests, and published endpoints in the cris.azure.cn portal or with APIs, you need to create a new project in the new portal to connect to these old entities.

模型生命周期Model lifecycle

自定义语音识别同时使用“基础模型”和“自定义模型” 。Custom Speech uses both base models and custom models. 每个语言都有一个或多个基础模型。Each language has one or more base models. 通常,在将新语音模型发布到常规语音服务时,同时会将其作为新基础模型导入到自定义语音识别服务。Generally, when a new speech model is released to the regular speech service, it's also imported to the Custom Speech service as a new base model. 这些模型每 3 到 6 个月更新一次。They're updated every 3 to 6 months. 较旧的模型通常逐渐变得没什么用,因为最新的模型通常具有更高的准确度。Older models typically become less useful over time because the newest model usually has higher accuracy.

而“自定义模型”是通过根据特定客户场景调整所选基础模型来创建的。In contrast, custom models are created by adapting a chosen base model to a particular customer scenario. 在拥有满足需求的模型之后,可以长久使用特定的自定义模型。You can keep using a particular custom model for a long time after you have one that meets your needs. 但建议定期更新到最新的基础模型,并在将来使用其他数据对其进行重新训练。But we recommend that you periodically update to the latest base model and retrain it over time with additional data.

与模型生命周期相关的其他关键术语包括:Other key terms related to the model lifecycle include:

  • 调整:使用基础模型,并使用文本数据和/或音频数据,根据你的领域/场景对其进行自定义。Adaptation: Taking a base model and customizing it to your domain/scenario by using text data and/or audio data.
  • 解码:使用模型并执行语音识别(将音频解码为文本)。Decoding: Using a model and performing speech recognition (decoding audio into text).
  • 终结点:以特定于用户的方式部署仅给定用户可访问的基础模型或自定义模型。Endpoint: A user-specific deployment of either a base model or a custom model that's accessible only to a given user.

过期时间线Expiration timeline

随着新模型和新功能的推出,旧的、不太准确的模型会停用,请查看以下模型和终结点到期的时间线:As new models and new functionality become available and older, less accurate models are retired, see the following timelines for model and endpoint expiration:

基础模型Base models

  • 调整:可使用一年。Adaptation: Available for one year. 导入模型后,可用来创建自定义模型的时间为一年。After the model is imported, it's available for one year to create custom models. 一年后,需要通过更新的基础模型版本创建新的自定义模型。After one year, new custom models must be created from a newer base model version.
  • 解码:导入后可使用两年。Decoding: Available for two years after import. 这样就可以创建一个终结点,并可对此模型使用批处理听录两年。So you can create an endpoint and use batch transcription for two years with this model.
  • 终结点:使用时间线与解码的时间线相同。Endpoints: Available on the same timeline as decoding.

自定义模式Custom models

  • 解码:创建模型后可使用两年。Decoding: Available for two years after the model is created. 这样就可以在创建自定义模型后使用该模型两年(批处理/实时/测试)。So you can use the custom model for two years (batch/realtime/testing) after it's created. 两年后,应该重新训练模型,因为基础模型通常会停用以进行调整。After two years, you should retrain your model because the base model will usually have been deprecated for adaptation.
  • 终结点:使用时间线与解码的时间线相同。Endpoints: Available on the same timeline as decoding.

当基础模型或自定义模型过期时,它将始终回退到最新的基础模型版本。When either a base model or custom model expires, it will always fall back to the newest base model version. 因此,你的实现永远不会中断,但如果自定义模型到期,模型对于特定数据的准确度可能会降低。So your implementation will never break, but it might become less accurate for your specific data if custom models reach expiration. 可以在自定义语音识别门户的以下位置查看模型的过期时间:You can see the expiration for a model in the following places in the Custom Speech portal:

  • 模型训练摘要Model training summary
  • 模型训练详细信息Model training detail
  • 部署摘要Deployment summary
  • 部署详细信息Deployment detail

还可以通过 GetModelGetBaseModel 自定义语音 API 在 JSON 响应中的 deprecationDates 属性下检查到期日期。You can also check the expiration dates via the GetModel and GetBaseModel custom speech APIs under the deprecationDates property in the JSON response.

请注意,可以通过自定义语音识别门户的“部署”部分或通过自定义语音识别 API 更改终结点使用的模型,在不停机的情况下升级自定义语音识别终结点上的模型。Note that you can upgrade the model on a custom speech endpoint without downtime by changing the model used by the endpoint in the deployment section of the custom speech portal, or via the custom speech API.

后续步骤Next steps