合成语音技术的负责任部署指南Guidelines for responsible deployment of synthetic voice technology

下面是 Microsoft 提供的使用合成语音技术时的一般设计准则。Here are Microsoft’s general design guidelines for using synthetic voice technology. Microsoft 在配音员、消费者以及有言语障碍的个人的协助下进行了研究,在研究中制定了这些准则,用以指导合成语音的负责任部署。These were developed in studies that Microsoft conducted with voice talent, consumers, as well as individuals with speech disorders to guide the responsible development of synthetic voice.

一般注意事项General considerations

对于合成语音技术的部署,以下准则适用于大多数方案。For deployment of synthetic speech technology, the following guidelines apply across most scenarios.

当语音为合成语音时进行披露Disclose when the voice is synthetic

公开某种语音是由计算机生成的,不仅可以最大限度地降低欺骗带来的有害后果风险,而且还能增加对提供声音的组织的信任。Disclosing that a voice is computer generated not only minimizes the risk of harmful outcomes from deception but also increases the trust in the organization delivering the voice. 详细了解如何进行披露Learn more about how to disclose.

为你的方案选择合适的语音类型Select appropriate voice types for your scenario

仔细考虑与使用合成语音相关的使用上下文和潜在危害。Carefully consider the context of use and the potential harms associated with using synthetic voice. 例如,高保真合成语音可能不适用于高风险方案,如个人消息传递、金融交易或需要人类的适应性或同理心的复杂情况。For example, high-fidelity synthetic voices may not be appropriate in high-risk scenarios, such as for personal messaging, financial transactions, or complex situations that require human adaptability or empathy. 用户对语音类型也可能有不同的期望。Users may also have different expectations for voice types. 例如,在收听由合成语音播报的敏感新闻时,一些用户喜欢更具同情心和更人性化的新闻播报,而另一些用户则喜欢更单调、无偏见的声音。For example, when listening to sensitive news being read by a synthetic voice, some users prefer a more empathetic and human-like reading of the news, while others preferred a more monotone, unbiased voice. 请考虑测试你的应用程序,以便更好地了解用户偏好。Consider testing your application to better understand user preferences.

对功能和限制保持透明Be transparent about capabilities and limitations

与高保真合成语音代理交互时,用户更有可能具有更高的期望。Users are more likely to have higher expectations when interacting with high-fidelity synthetic voice agents. 因此,当系统功能不能满足这些预期时,信任可能会受到影响,并可能导致令人不愉快的体验甚至有害的体验。Consequently, when system capabilities don't meet those expectations, trust can suffer, and may result in unpleasant, or even harmful experiences.

提供可选的人工支持Provide optional human support

在模棱两可的事务性场景中(例如,呼叫支持中心),用户并不总是相信计算机代理能够适当地响应他们的请求。In ambiguous, transactional scenarios (for example, a call support center), users don't always trust a computer agent to appropriately respond to their requests. 在这些情况下,无论语音的逼真度或系统的功能如何,都可能需要人工支持。Human support may be necessary in these situations, regardless of the realistic quality of the voice or capability of the system.

针对配音员的注意事项Considerations for voice talent

与配音员(例如声优)合作创建合成语音时,以下准则适用。When working with voice talent, such as voice actors, to create synthetic voices, the guideline below applies.

配音员希望能够控制他们的语音字体(如何使用以及在哪里使用),并在其语音被使用时获得补偿。Voice talent expect to have control over their voice font (how and where it will be used) and be compensated anytime it's used. 因此,系统所有者应获得配音员的明确书面许可,并在用例、使用期限、补偿等合同细节方面明确化。System owners should therefore obtain explicit written permission from voice talent, and have clear contractual specifications on use cases, duration of use, compensation, and so on. 一些配音员不知道该技术的潜在恶意使用,因此系统所有者应向他们介绍该技术的功能。Some voice talent are unaware of the potential malicious uses of the technology and should be educated by system owners about the capabilities of the technology. 有关配音员和许可的详细信息,请阅读针对配音员的披露内容For more on voice talent and consent, read our Disclosure for Voice Talent.

针对言语障碍患者的注意事项Considerations for those with speech disorders

与有言语障碍的人合作创建或部署合成语音技术时,以下准则适用。When working with individuals with speech disorders, to create or deploy synthetic voice technology, the following guidelines apply.

提供签订合同的准则Provide guidelines to establish contracts

提供与使用合成语音协助说话的个人签订合同的准则。Provide guidelines for establishing contracts with individuals who use synthetic voice for assistance in speaking. 合同应考虑规定语音的所有者、使用期限、所有权转移条件、删除语音字体的规程以及如何防止未经授权的访问。The contract should consider specifying the parties who own the voice, duration of use, ownership transfer criteria, procedures for deleting the voice font, and how to prevent unauthorized access. 另外,在获得当事人同意的情况下,可以在该人死后将语音字体所有权以合同形式转让给其家属。Additionally, enable the contractual transfer of voice font ownership after death to family members if that person has given permission.

将语音模式的不一致性考虑在内Account for inconsistencies in speech patterns

对于那些录制自己的语音字体的言语障碍患者来说,他们的语音模式的不一致性(含糊不清或无法发特定单词的音)可能会使录制过程复杂化。For individuals with speech disorders who record their own voice fonts, inconsistencies in their speech pattern (slurring or inability to pronounce certain words) may complicate the recording process. 在这些情况下,合成语音技术和录制会话应当适应他们的节奏(也就是说,应提供中断和额外数量的录制会话)。In these cases, synthetic voice technology and recording sessions should accommodate them (that is, provide breaks and additional number of recording sessions).

允许在一段时间内修改Allow modification over time

具有言语障碍的人希望对其合成语音进行更新以反映年龄的增长(例如,儿童进入青春期)。Individuals with speech disorders desire to make updates to their synthetic voice to reflect aging (for example, a child reaching puberty). 这些人还可能有随时间而改变的风格偏好,可能想要改变音调、口音或其他语音特征。Individuals may also have stylistic preferences that change over time, and may want to make changes to pitch, accent, or other voice characteristics.

参考文档Reference docs

后续步骤Next steps