语音转文本常见问题解答Speech to Text frequently asked questions

如果在本常见问题解答中找不到你的问题的解答,请检查其他支持选项If you can't find answers to your questions in this FAQ, check out other support options.


问:基线模型和自定义语音转文本模型之间有什么区别?Q: What is the difference between a baseline model and a custom Speech to Text model?

:基线模型已使用 Microsoft 拥有的数据定型,并且已部署在云中。A: A baseline model has been trained by using Microsoft-owned data and is already deployed in the cloud. 你可以使用自定义模型来调整模型,以便更好地适应具有特定环境噪音或语言的具体环境。You can use a custom model to adapt a model to better fit a specific environment that has specific ambient noise or language. 工厂、汽车或嘈杂的街道需要适应的声学模型。Factory floors, cars, or noisy streets would require an adapted acoustic model. 生物学、物理学、放射学、产品名称和自定义首字母缩略词等主题需要适应的语言模型。Topics like biology, physics, radiology, product names, and custom acronyms would require an adapted language model.

问:如果想要使用基线模型,从何处开始?Q: Where do I start if I want to use a baseline model?

:首先,获取 订阅密钥A: First, get a subscription key. 如果想要对预先部署的基线模型进行 REST 调用,请参阅 REST APIIf you want to make REST calls to the predeployed baseline models, see the REST APIs. 如果想要使用 WebSocket,请下载 SDKIf you want to use WebSockets, download the SDK.

问:是否始终需要生成自定义语音识别模型?Q: Do I always need to build a custom speech model?

:否。A: No. 如果应用程序使用通用的日常语言,则无需自定义模型。If your application uses generic, day-to-day language, you don't need to customize a model. 如果应用程序用于背景噪音很小或无背景噪音的环境,则无需自定义模型。If your application is used in an environment where there's little or no background noise, you don't need to customize a model.

你可以在门户中部署基线模型和自定义模型,并针对这些模型运行准确度测试。You can deploy baseline and customized models in the portal and then run accuracy tests against them. 可以使用此功能衡量基线模型与自定义模型的准确度。You can use this feature to measure the accuracy of a baseline model versus a custom model.

问:如何知道何时完成数据集或模型的处理?Q: How will I know when processing for my dataset or model is complete?

:目前,表中模型或数据集的状态是唯一可以了解的途径。A: Currently, the status of the model or dataset in the table is the only way to know. 处理完成后,状态是“成功” 。When the processing is complete, the status is Succeeded.

问:能否创建多个模型?Q: Can I create more than one model?

:集合中可以拥有的模型数量没有限制。A: There's no limit on the number of models you can have in your collection.

问:我意识到自己犯了一个错误。 如何取消正在进行的数据导入或模型创建?Q: I realized I made a mistake. How do I cancel my data import or model creation that’s in progress?

:当前无法回滚声学或语言适应过程。A: Currently, you can't roll back an acoustic or language adaptation process. 可以在导入的数据和模型处于终点状态时删除它们。You can delete imported data and models when they're in a terminal state.

问:搜索和听写模型与对话模型之间有什么区别?Q: What's the difference between the Search and Dictation model and the Conversational model?

:你可以在语音服务中从多个基线模型中进行选择。A: You can choose from more than one baseline model in the Speech service. 对话模型适用于识别以对话方式说出的语音。The Conversational model is useful for recognizing speech that is spoken in a conversational style. 此模型非常适合转录电话。This model is ideal for transcribing phone calls. 搜索和听写模型非常适合语音触发的应用。The Search and Dictation model is ideal for voice-triggered apps. 通用模型是一种旨在解决这两种情况的新模型。The Universal model is a new model that aims to address both scenarios. 在大多数区域设置中,通用模型目前处于或高于对话式模型的质量级别。The Universal model is currently at or above the quality level of the Conversational model in most locales.

问:能否更新现有模型(模型堆叠)?Q: Can I update my existing model (model stacking)?

:无法更新现有模型。A: You can't update an existing model. 一种解决方案是将旧数据集与新数据集合并,然后重新适应。As a solution, combine the old dataset with the new dataset and readapt.

旧数据集和新数据集必须合并为单个 .zip 文件(用于声学数据)或 .txt 文件(用于语言数据)。The old dataset and the new dataset must be combined in a single .zip file (for acoustic data) or in a .txt file (for language data). 适应完成后,需要重新部署新的更新后模型以获取新的终结点When adaptation is finished, the new, updated model needs to be redeployed to obtain a new endpoint

问:当新版本的基线可用时,是否会自动更新我的部署?Q: When a new version of a baseline is available, is my deployment automatically updated?

:部署不会自动更新。A: Deployments will NOT be automatically updated.

如果已调整并部署了具有基线 V1.0 的模型,该部署将保持原样。If you have adapted and deployed a model with baseline V1.0, that deployment will remain as is. 客户可以解除已部署的模型,使用较新版本的基线重新调整并重新部署。Customers can decommission the deployed model, readapt using the newer version of the baseline and redeploy.

问:能否下载模型并在本地运行?Q: Can I download my model and run it locally?

:无法下载模型并在本地执行。A: Models can't be downloaded and executed locally.

问:是否会记录我的请求?Q: Are my requests logged?

:默认情况下不记录请求(既不进行音频记录,也不进行听录)。A: By default requests are not logged (neither audio, nor transcription). 如果需要,可以在创建自定义终结点时选择“从此终结点记录内容”选项以启用跟踪。If required you may select Log content from this endpoint option when you create a custom endpoint to enable tracing. 然后,请求会记录在 Azure 的安全存储中。Then requests will be logged in Azure in secure storage.

问:我的请求是否受到限制?Q: Are my requests throttled?

:请参阅 语音服务配额和限制A: See Speech Services Quotas and Limits.

问:双声道音频如何收费?Q: How am I charged for dual channel audio?

:如果你单独提交每个声道(每个声道在其自己的文件中),则将按每个文件的持续时间对你收费。A: If you submit each channel separately (each channel in its own file), you will be charged for the duration of each file. 如果你提交单个文件,其中每个声道都一起多路复用,则按单个文件的持续时间对你收费。If you submit a single file with each channel multiplexed together, then you will be charged for the duration of the single file. 有关定价的详细信息,请参阅 Azure 认知服务定价页For details on pricing please refer to the Azure Cognitive Services pricing page.


如果有禁止使用自定义语音识别服务的其他隐私问题,请联系其中一个支持渠道。If you have further privacy concerns that prohibit you from using the custom Speech service, contact one of the support channels.

提高并发性Increasing concurrency

请参阅语音服务配额和限制See Speech Services Quotas and Limits.

导入数据Importing data

问:数据集大小的限制是什么?为何限制?Q: What is the limit on the size of a dataset, and why is it the limit?

:之所以有此限制,是由于 HTTP 上传文件大小存在限制。A: The limit is due to the restriction on the size of a file for HTTP upload. 有关实际限制,请参阅语音服务配额和限制See Speech Services Quotas and Limits for the actual limit.

问:是否可以压缩文本文件,以便上传更大的文本文件?Q: Can I zip my text files so I can upload a larger text file?

:否。A: No. 目前,仅允许未压缩的文本文件。Currently, only uncompressed text files are allowed.

问:数据报告表明,有言语导入失败。问题出在哪里?Q: The data report says there were failed utterances. What is the issue?

:未能上传文件中 100% 的话语并不是什么问题。A: Failing to upload 100 percent of the utterances in a file is not a problem. 如果成功导入了声学或语言数据集中的绝大多数话语(如 95% 以上的话语),则该数据集可用。If the vast majority of the utterances in an acoustic or language dataset (for example, more than 95 percent) are successfully imported, the dataset can be usable. 但是,建议尝试了解话语失败的原因并解决问题。However, we recommend that you try to understand why the utterances failed and fix the problems. 大多数常见问题(如格式设置错误)很容易修复。Most common problems, such as formatting errors, are easy to fix.

创建声学模型Creating an acoustic model

问:需要多少声学数据?Q: How much acoustic data do I need?

:建议开始时先使用 30 分钟到 1 小时的声学数据。A: We recommend starting with between 30 minutes and one hour of acoustic data.

问:应该收集哪些数据?Q: What data should I collect?

:收集尽可能接近于应用程序方案和用例的数据。A: Collect data that's as close to the application scenario and use case as possible. 数据收集应在设备、环境和说话人类型方面与目标应用程序和用户匹配。The data collection should match the target application and users in terms of device or devices, environments, and types of speakers. 一般而言,应从尽可能广泛的说话人中收集数据。In general, you should collect data from as broad a range of speakers as possible.

问:如何收集声学数据?Q: How should I collect acoustic data?

:可以创建独立的数据收集应用程序,或使用现成的录音软件。A: You can create a standalone data collection application or use off-the-shelf audio recording software. 你还可以创建一个用于记录音频数据并使用该数据的应用程序版本。You can also create a version of your application that logs the audio data and then uses the data.

问:是否需要自行转录适应数据?Q: Do I need to transcribe adaptation data myself?

:能!A: Yes! 可以自行转录或使用专业听录服务进行转录。You can transcribe it yourself or use a professional transcription service. 有些用户更喜欢使用专业听录器,而其他用户则使用众包或自己进行听录。Some users prefer professional transcribers and others use crowdsourcing or do the transcriptions themselves.

精确度测试Accuracy testing

问:是否可以使用自定义语言模型对我的自定义声学模型执行离线测试?Q: Can I perform offline testing of my custom acoustic model by using a custom language model?

:可以,只需在设置离线测试时从下拉菜单中选择自定义语言模型即可。A: Yes, just select the custom language model in the drop-down menu when you set up the offline test.

问:是否可以使用自定义声学模型对我的自定义语言模型执行离线测试?Q: Can I perform offline testing of my custom language model by using a custom acoustic model?

:可以,只需在设置脱机测试时,选择下拉菜单中的自定义声学模型即可。A: Yes, just select the custom acoustic model in the drop-down menu when you set up the offline test.

问:什么是字错误率 (WER) 以及如何计算此错误率?Q: What is word error rate (WER) and how is it computed?

:WER 是用于语音识别的评估指标。A: WER is the evaluation metric for speech recognition. WER 由错误总数(包括插入、删除和替换)除以引用听录中的总字数得出。WER is counted as the total number of errors, which includes insertions, deletions, and substitutions, divided by the total number of words in the reference transcription.

问:如何确定准确度测试的结果是否良好?Q: How do I determine whether the results of an accuracy test are good?

:测试结果对基线模型和自定义模型进行了比较。A: The results show a comparison between the baseline model and the model you customized. 应以超越基线模型为目标,使自定义模型变得有价值。You should aim to beat the baseline model to make customization worthwhile.

问:如何确定基础模型的 WER 以便查看是否有改进?Q: How do I determine the WER of a base model so I can see if there was an improvement?

:离线测试结果显示了自定义模型的基线准确度以及与基线相比的改进情况。A: The offline test results show the baseline accuracy of the custom model and the improvement over baseline.

创建语言模型Creating a language model

问:需要上传多少文本数据?Q: How much text data do I need to upload?

:这取决于应用程序中使用的词汇和短语与初始语言模型存在多大差异。A: It depends on how different the vocabulary and phrases used in your application are from the starting language models. 对于所有新字词,尽可能多地提供这些字的使用示例很有用。For all new words, it's useful to provide as many examples as possible of the usage of those words. 对于应用程序中使用的常用短语,在语言数据中添加短语也很有用,因为这会告知系统也要侦听这些术语。For common phrases that are used in your application, including phrases in the language data is also useful because it tells the system to also listen for these terms. 在语言数据集中至少有 100 句话语(通常几百句或更多话语)是很常见的。It's common to have at least 100, and typically several hundred or more utterances in the language dataset. 另外,如果预期某些类型的查询比其他查询更加常用,则可以在数据集中插入常用查询的多个副本。Also, if some types of queries are expected to be more common than others, you can insert multiple copies of the common queries in the dataset.

问:能否只上传字词列表?Q: Can I just upload a list of words?

:上传字词列表会将字词添加到词汇中,但不会告知系统这些字词的通常用法。A: Uploading a list of words will add the words to the vocabulary, but it won't teach the system how the words are typically used. 通过提供完整或部分话语(用户很可能会说事物的句子或短语),语言模型可以学习这些新字词及其用法。By providing full or partial utterances (sentences or phrases of things that users are likely to say), the language model can learn the new words and how they are used. 自定义语言模型不仅适用于向系统中添加新字词,还适用于调整应用程序已知字词的概率。The custom language model is good not only for adding new words to the system, but also for adjusting the likelihood of known words for your application. 提供完整话语可帮助系统更好地学习。Providing full utterances helps the system learn better.

后续步骤Next steps