什么是自定义语音识别?What is Custom Speech?

自定义语音识别是一组联机工具,可用于针对你的应用程序、工具和产品评估并改进 Microsoft 的语音转文本准确度。Custom Speech is a set of online tools that allow you to evaluate and improve Microsoft's speech-to-text accuracy for your applications, tools, and products. 只需准备几个测试性音频文件即可开始。All it takes to get started are a handful of test audio files. 请遵循以下链接开始创建自定义语音转文本体验。Follow the links below to start creating a custom speech-to-text experience.

什么是自定义语音识别?What's in Custom Speech?

在开始使用自定义语音识别执行任何操作之前,需要一个 Azure 帐户和一个语音服务订阅。Before you can do anything with Custom Speech, you'll need an Azure account and a Speech service subscription. 有了帐户后,即可准备数据、训练和测试模型、查看识别质量、评估准确度,并最终部署和使用自定义语音转文本模型。Once you've got an account, you can prep your data, train and test your models, inspect recognition quality, evaluate accuracy, and ultimately deploy and use the custom speech-to-text model.

此图突出显示了自定义语音识别门户的组件。This diagram highlights the pieces that make up the Custom Speech portal. 使用以下链接详细了解每个步骤。Use the links below to learn more about each step.

突出显示组成自定义语音识别门户的不同组件。

  1. 订阅和创建项目 - 创建 Azure 帐户并订阅语音服务。Subscribe and create a project - Create an Azure account and subscribe to the Speech service. 使用此统一订阅可以访问语音转文本、文本转语音、语音翻译和自定义语音识别门户This unified subscription gives you access to speech-to-text, text-to-speech, speech translation, and the Custom Speech portal. 然后,可以使用语音服务订阅创建第一个“自定义语音识别”项目。Then, using your Speech service subscription, create your first Custom Speech project.

  2. 上传测试数据 - 上传测试数据(音频文件),以便针对你的应用程序、工具和产品评估 Microsoft 的语音转文本产品/服务。Upload test data - Upload test data (audio files) to evaluate Microsoft's speech-to-text offering for your applications, tools, and products.

  3. 检查识别质量 - 使用自定义语音识别门户播放上传的音频,检查测试数据的语音识别质量。Inspect recognition quality - Use the Custom Speech portal to play back uploaded audio and inspect the speech recognition quality of your test data. 如需进行量化度量,请参阅检查数据For quantitative measurements, see Inspect data.

  4. 评估准确度 - 评估语音转文本模型的准确度。Evaluate accuracy - Evaluate the accuracy of the speech-to-text model. 自定义语音识别门户会提供误字率,该指标可以用来确定是否需要更多的训练。The Custom Speech portal will provide a Word Error Rate, which can be used to determine if additional training is required. 如果对准确度满意,可以直接使用语音服务 API。If you're satisfied with the accuracy, you can use the Speech service APIs directly. 如果希望提高准确度 5% - 20%(相对平均值),请在门户中使用“训练”选项卡上传更多的训练数据,例如人为标记的听录和相关的文本。If you'd like to improve accuracy by a relative average of 5% - 20%, use the Training tab in the portal to upload additional training data, such as human-labeled transcripts and related text.

  5. 提高准确性 - 根据你的方案,策略性地选择其他训练数据以提高语音转文本模型的质量。Improve accuracy - Choose additional training data strategically to improve the quality of the speech-to-text model based on your scenario.

  6. 训练模型 - 提供编写的脚本(10-1,000 小时)和相关的文本 (<200 MB) 以及音频测试数据,以便提高语音转文本模型的准确度。Train the model - Improve the accuracy of your speech-to-text model by providing written transcripts (10-1,000 hours) and related text (<200 MB) along with your audio test data. 该数据有助于训练语音转文本模型。This data helps to train the speech-to-text model. 训练并重新测试后,如果对结果感到满意,则可部署模型。After training, retest, and if you're satisfied with the result, you can deploy your model.

  7. 部署模型 - 为语音转文本模型创建自定义终结点,并在应用程序、工具或产品中使用它。Deploy the model - Create a custom endpoint for your speech-to-text model and use it in your applications, tools, or products.

设置 Azure 帐户Set up your Azure account

在使用“自定义语音识别”门户创建自定义模型之前,需要获取一个语音服务订阅。A Speech service subscription is required before you can use the Custom Speech portal to create a custom model. 请遵照这些说明创建标准语音服务订阅:创建语音订阅Follow these instructions to create a standard Speech service subscription: Create a Speech Subscription.

备注

请务必创建标准 (S0) 订阅。免费试用 (F0) 订阅不受支持。Please be sure to create standard (S0) subscriptions, free trial (F0) subscriptions are not supported.

创建 Azure 帐户和语音服务订阅后,需要登录到“自定义语音识别”门户并连接订阅。Once you've created an Azure account and a Speech service subscription, you'll need to sign in to Custom Speech portal and connect your subscription.

  1. 从 Azure 门户获取语音服务订阅密钥。Get your Speech service subscription key from the Azure portal.
  2. 登录到自定义语音识别门户Sign-in to the Custom Speech portal.
  3. 选择需在其上工作的订阅并创建语音项目。Select the subscription you need to work on and create a speech project.
  4. 若要修改订阅,请使用顶部导航栏中的齿轮图标。If you'd like to modify your subscription, use the cog icon located in the top navigation.

如何创建项目How to create a project

数据、模型、测试和终结点等内容在自定义语音识别门户中组织成项目Content like data, models, tests, and endpoints are organized into Projects in the Custom Speech portal. 每个项目特定于域和国家/地区或语言。Each project is specific to a domain and country/language. 例如,可以为使用美式英语的呼叫中心创建一个项目。For example, you may create a project for call centers that use English in the United States.

若要创建第一个项目,请选择“语音转文本/自定义语音识别”选项卡,然后单击“新建项目”。 To create your first project, select the Speech-to-text/Custom speech, then click New Project. 遵照向导中的说明创建项目。Follow the instructions provided by the wizard to create your project. 创建项目后,应该看到四个选项卡:“数据”、“测试”、“训练”和“部署”。 After you've created a project, you should see four tabs: Data, Testing, Training, and Deployment. 使用后续步骤中提供的链接了解如何使用每个选项卡。Use the links provided in Next steps to learn how to use each tab.

重要

“自定义语音识别”门户最近已更新!The Custom Speech portal was recently updated! 如果以前已在 CRIS.ai 门户或使用 API 创建了数据、模型、测试并已发布了终结点,则需要在新门户中创建一个新项目以连接到这些旧实体。If you created previous data, models, tests, and published endpoints in the CRIS.ai portal or with APIs, you need to create a new project in the new portal to connect to these old entities.

后续步骤Next steps