检测和编辑对话中的个人身份信息

Azure语言对话 PII API 分析音频话语，以使用各种预定义类别识别和修订敏感信息（PII）。此 API 适用于转录的文本（称为脚本）和聊天。对于转录，它还通过提供包含 PII 音频段的计时信息，促进了这些段的编辑。

确定如何处理数据（可选）

指定 PII 检测模型

默认情况下，此功能在输入上使用最新的可用 AI 模型。你还可以将 API 请求配置为使用特定模型版本。

语言支持

有关详细信息， 请参阅PII 语言支持页。目前，对话 PII GA 模型仅支持英语。预览模型和 API 支持与其他语言相同的列表语言。

区域支持

会话 "PII API" 支持 Azure 语言服务所支持的所有 Azure 区域。

提交数据

可以将输入作为会话项列表提交到 API。在收到请求时执行分析。由于 API 是异步的，因此发送 API 请求和接收结果之间可能存在延迟。有关每分钟和每秒可以发送的请求的大小和数量的信息，请参阅以下数据限制。
使用异步功能时，API 结果在引入请求后 24 小时内可用，并在响应中指示。在此时间段后，结果将被清除，并且不再可用于检索。
将数据提交给会话 PII 时，每个请求可以发送一个对话（文本或语音）。
API 尝试检测给定聊天输入的所有已定义实体类别。如果要指定将检测并返回哪些实体，请使用可选的 piiCategories 参数指定相应的实体类别。
对于口语脚本，检测到的实体会根据 redactionSource 提供的参数值返回。目前，支持的值为redactionSourcetext，lexicalitn以及maskedItn（分别映射到语音转文本 REST API、display\displayTextlexicalitn格式和maskedItn格式）。此外，对于口述脚本输入，此 API 还提供音频计时信息，以增强音频修订功能。若要使用 audioRedaction 功能，请使用可选的includeAudioRedaction标记并赋予true值。音频修订基于词法输入格式执行。
对话可以包含对话项列表（轮次）。每个会话项目的最大限制为 1000 个（不适用于整个对话）：
- 多轮对话示例
  - （conv item1）用户：你好！
  - （conv item2）机器人：您好，我能帮您什么？
  - （conv item3）用户：下一列火车离开巴黎还有什么时间？

获取 PII 结果

从 PII 检测获得结果时，可以将结果流式传输到应用程序或将输出保存到本地系统上的文件中。 API 响应包括识别的实体，包括其类别和子类别，以及置信度分数。还将返回 PII 实体已被去除的文本字符串。

转到 Azure 门户的资源概述页
在左侧菜单中，选择"密钥和终结点”。需要其中一个密钥和终结点对 API 请求进行身份验证。
下载并安装所选语言的客户端库包：

语言包版本

.NET 1.0.0

Python 1.0.0
有关客户端和返回对象的详细信息， 请参阅 以下参考文档：
- C#
- Python

语言	包版本
.NET	1.0.0
Python	1.0.0

编辑政策（仅限2024年11月15日预览版）

在版本 2024-11-15-preview 中，可以定义参数 redactionPolicy 以反映在响应中编辑文档时要使用的修订策略。策略字段支持三种策略类型：

noMask
characterMask（默认值）
entityMask

该 noMask 策略允许用户返回没有字段的 redactedText 响应。

该 characterMask 策略允许 redactedText 用字符屏蔽，从而保留原始文本的长度和偏移量。此行为符合现有预期。

还有一个名为 redactionCharacter 的可选字段，可以在其中输入在修订中使用的字符（如果正在使用 characterMask 策略）

通过 entityMask 此策略，可以使用检测到的实体类型屏蔽检测到的 PII 实体文本

如果要更改修订策略，请使用以下示例。

curl -i -X POST https://your-language-endpoint-here/language/analyze-conversations/jobs?api-version=2024-05-01 \
-H "Content-Type: application/json" \
-H "Ocp-Apim-Subscription-Key: your-key-here" \
-d \
'
{
    "displayName": "Analyze conversations from xxx",
    "analysisInput": {
        "conversations": [
            {
                "id": "23611680-c4eb-4705-adef-4aa1c17507b5",
                "language": "en",
                "modality": "text",
                "conversationItems": [
                    {
                        "participantId": "agent_1",
                        "id": "1",
                        "text": "Good morning."
                    },
                    {
                        "participantId": "agent_1",
                        "id": "2",
                        "text": "Can I have your name?"
                    },
                    {
                        "participantId": "customer_1",
                        "id": "3",
                        "text": "Sure that is John Doe."
                    }
                ]
            }
        ]
    },
    "tasks": [
        {
            "taskName": "analyze 1",
            "kind": "ConversationalPIITask",
            "parameters": {
                "modelVersion": "2023-04-15-preview",
                "redactionCharacter"
                "redactionPolicy": {
                    "policyKind": "characterMask",
                    //characterMask|entityMask|noMask
                    "redactionCharacter": "*"
                }
            }
        }
    ]
}
`

通过语音转文本提交记录

如果使用语音服务的语音转文本功能转录对话，请使用以下示例：

curl -i -X POST https://your-language-endpoint-here/language/analyze-conversations/jobs?api-version=2024-05-01 \
-H "Content-Type: application/json" \
-H "Ocp-Apim-Subscription-Key: your-key-here" \
-d \
'
{
    "displayName": "Analyze conversations from xxx",
    "analysisInput": {
        "conversations": [
            {
                "id": "23611680-c4eb-4705-adef-4aa1c17507b5",
                "language": "en",
                "modality": "transcript",
                "conversationItems": [
                    {
                        "participantId": "agent_1",
                        "id": "8074caf7-97e8-4492-ace3-d284821adacd",
                        "text": "Good morning.",
                        "lexical": "good morning",
                        "itn": "good morning",
                        "maskedItn": "good morning",
                        "audioTimings": [
                            {
                                "word": "good",
                                "offset": 11700000,
                                "duration": 2100000
                            },
                            {
                                "word": "morning",
                                "offset": 13900000,
                                "duration": 3100000
                            }
                        ]
                    },
                    {
                        "participantId": "agent_1",
                        "id": "0d67d52b-693f-4e34-9881-754a14eec887",
                        "text": "Can I have your name?",
                        "lexical": "can i have your name",
                        "itn": "can i have your name",
                        "maskedItn": "can i have your name",
                        "audioTimings": [
                            {
                                "word": "can",
                                "offset": 44200000,
                                "duration": 2200000
                            },
                            {
                                "word": "i",
                                "offset": 46500000,
                                "duration": 800000
                            },
                            {
                                "word": "have",
                                "offset": 47400000,
                                "duration": 1500000
                            },
                            {
                                "word": "your",
                                "offset": 49000000,
                                "duration": 1500000
                            },
                            {
                                "word": "name",
                                "offset": 50600000,
                                "duration": 2100000
                            }
                        ]
                    },
                    {
                        "participantId": "customer_1",
                        "id": "08684a7a-5433-4658-a3f1-c6114fcfed51",
                        "text": "Sure that is John Doe.",
                        "lexical": "sure that is john doe",
                        "itn": "sure that is john doe",
                        "maskedItn": "sure that is john doe",
                        "audioTimings": [
                            {
                                "word": "sure",
                                "offset": 5400000,
                                "duration": 6300000
                            },
                            {
                                "word": "that",
                                "offset": 13600000,
                                "duration": 2300000
                            },
                            {
                                "word": "is",
                                "offset": 16000000,
                                "duration": 1300000
                            },
                            {
                                "word": "john",
                                "offset": 17400000,
                                "duration": 2500000
                            },
                            {
                                "word": "doe",
                                "offset": 20000000,
                                "duration": 2700000
                            }
                        ]
                    }
                ]
            }
        ]
    },
    "tasks": [
        {
            "taskName": "analyze 1",
            "kind": "ConversationalPIITask",
            "parameters": {
                "modelVersion": "2023-04-15-preview",
                "redactionSource": "text",
                "includeAudioRedaction": true,
                "piiCategories": [
                    "all"
                ]
            }
        }
    ]
}
`

提交文本对话

如果你有源自文本的对话，请使用以下示例。例如，通过基于文本的聊天客户端进行对话。

curl -i -X POST https://your-language-endpoint-here/language/analyze-conversations/jobs?api-version=2024-05-01 \
-H "Content-Type: application/json" \
-H "Ocp-Apim-Subscription-Key: your-key-here" \
-d \
'
{
    "displayName": "Analyze conversations from xxx",
    "analysisInput": {
        "conversations": [
            {
                "id": "23611680-c4eb-4705-adef-4aa1c17507b5",
                "language": "en",
                "modality": "text",
                "conversationItems": [
                    {
                        "participantId": "agent_1",
                        "id": "8074caf7-97e8-4492-ace3-d284821adacd",
                        "text": "Good morning."
                    },
                    {
                        "participantId": "agent_1",
                        "id": "0d67d52b-693f-4e34-9881-754a14eec887",
                        "text": "Can I have your name?"
                    },
                    {
                        "participantId": "customer_1",
                        "id": "08684a7a-5433-4658-a3f1-c6114fcfed51",
                        "text": "Sure that is John Doe."
                    }
                ]
            }
        ]
    },
    "tasks": [
        {
            "taskName": "analyze 1",
            "kind": "ConversationalPIITask",
            "parameters": {
                "modelVersion": "2023-04-15-preview"
            }
        }
    ]
}
`

获取结果

从响应头获取 operation-location。该值类似于以下 URL：

https://your-language-endpoint/language/analyze-conversations/jobs/12345678-1234-1234-1234-12345678

要获取请求的结果，请使用以下 cURL 命令。请务必将 my-job-id 替换为从之前的 operation-location 响应头中收到的数值 ID 值：

curl -X GET    https://your-language-endpoint/language/analyze-conversations/jobs/my-job-id \
-H "Content-Type: application/json" \
-H "Ocp-Apim-Subscription-Key: your-key-here"

服务和数据限制

有关每分钟和每秒可以发送的请求大小和数量信息，请参阅服务限制一文。

Last updated on 2026-06-10