在 Azure 认知搜索扩充管道中自定义 Web API 技能。Custom Web API skill in an Azure Cognitive Search enrichment pipeline
借助自定义 Web API 技能,可以通过调用提供自定义操作的 Web API 终结点来扩展 AI 扩充。The Custom Web API skill allows you to extend AI enrichment by calling out to a Web API endpoint providing custom operations. 与内置技能类似,“自定义 Web API” 技能也有输入和输出。Similar to built-in skills, a Custom Web API skill has inputs and outputs. Web API 根据输入在索引器运行时接收 JSON 有效负载,并输出 JSON 有效负载作为响应,以及成功状态代码。Depending on the inputs, your Web API receives a JSON payload when the indexer runs, and outputs a JSON payload as a response, along with a success status code. 响应应包含自定义技能指定的输出。The response is expected to have the outputs specified by your custom skill. 其他任何响应都被视为错误,并且不会执行任何扩充。Any other response is considered an error and no enrichments are performed.
本文档进一步详细介绍了 JSON 有效负载的结构。The structure of the JSON payloads are described further down in this document.
备注
索引器会对 Web API 返回的某些标准 HTTP 状态代码重试两次。The indexer will retry twice for certain standard HTTP status codes returned from the Web API. 这些 HTTP 状态代码为:These HTTP status codes are:
502 Bad Gateway
503 Service Unavailable
429 Too Many Requests
@odata.type
Microsoft.Skills.Custom.WebApiSkillMicrosoft.Skills.Custom.WebApiSkill
技能参数Skill parameters
参数区分大小写。Parameters are case-sensitive.
参数名称Parameter name | 说明Description |
---|---|
uri |
将 JSON 有效负载发送到的 Web API 的 URI。The URI of the Web API to which the JSON payload will be sent. 只允许使用 https URI 方案Only https URI scheme is allowed |
httpMethod |
发送有效负载时使用的方法。The method to use while sending the payload. 允许使用的方法为 PUT 或 POST Allowed methods are PUT or POST |
httpHeaders |
键值对集合,其中键表示头名称,值表示发送到 Web API 的头值以及有效负载。A collection of key-value pairs where the keys represent header names and values represent header values that will be sent to your Web API along with the payload. 此集合中禁止使用以下头:Accept 、Accept-Charset 、Accept-Encoding 、Content-Length 、Content-Type 、Cookie 、Host 、TE 、Upgrade 、Via The following headers are prohibited from being in this collection: Accept , Accept-Charset , Accept-Encoding , Content-Length , Content-Type , Cookie , Host , TE , Upgrade , Via |
timeout |
(可选)如果指定,表明执行 API 调用的 http 客户端的超时值。(Optional) When specified, indicates the timeout for the http client making the API call. 必须将其格式化为 XSD“dayTimeDuration”值(ISO 8601 持续时间值的受限子集)。It must be formatted as an XSD "dayTimeDuration" value (a restricted subset of an ISO 8601 duration value). 例如,PT60S 表示 60 秒。For example, PT60S for 60 seconds. 如果未设置,选择的是默认值 30 秒。If not set, a default value of 30 seconds is chosen. 超时可以设置为最大 230 秒和最小 1 秒。The timeout can be set to a maximum of 230 seconds and a minimum of 1 second. |
batchSize |
(可选)表示每 API 调用发送多少个“数据记录”(请参阅下面的 JSON 有效负载结构)。(Optional) Indicates how many "data records" (see JSON payload structure below) will be sent per API call. 如果未设置,选择的是默认值 1000。If not set, a default of 1000 is chosen. 建议使用此参数在索引编制吞吐量和 API 负载之间进行适当取舍We recommend that you make use of this parameter to achieve a suitable tradeoff between indexing throughput and load on your API |
degreeOfParallelism |
(可选)如果指定了此值,则指示索引器将对你提供的终结点进行的并行调用数。(Optional) When specified, indicates the number of calls the indexer will make in parallel to the endpoint you have provided. 如果终结点在请求负载过高的情况下失败,则可以减小此值;如果终结点能够接受更多请求并且希望提高索引器的性能,则可以增大此值。You can decrease this value if your endpoint is failing under too high of a request load, or raise it if your endpoint is able to accept more requests and you would like an increase in the performance of the indexer. 如果未设置,则将使用默认值 5。If not set, a default value of 5 is used. 可以为 degreeOfParallelism 设置的最大值为 10,最小值为 1。The degreeOfParallelism can be set to a maximum of 10 and a minimum of 1. |
技能输入Skill inputs
此技能没有“预定义”输入。There are no "predefined" inputs for this skill. 可以选择在执行此技能时已提供的一个或多个字段作为输入,发送到 Web API 的 JSON 有效负载将会有不同的字段。You can choose one or more fields that would be already available at the time of this skill's execution as inputs and the JSON payload sent to the Web API will have different fields.
技能输出Skill outputs
此技能没有“预定义”输出。There are no "predefined" outputs for this skill. 根据 Web API 将返回的响应,添加输出字段,以便能够从 JSON 响应中选择这些字段。Depending on the response your Web API will return, add output fields so that they can be picked up from the JSON response.
示例定义Sample definition
{
"@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
"description": "A custom skill that can identify positions of different phrases in the source text",
"uri": "https://contoso.count-things.com",
"batchSize": 4,
"context": "/document",
"inputs": [
{
"name": "text",
"source": "/document/content"
},
{
"name": "language",
"source": "/document/languageCode"
},
{
"name": "phraseList",
"source": "/document/keyphrases"
}
],
"outputs": [
{
"name": "hitPositions"
}
]
}
示例输入 JSON 结构Sample input JSON structure
此 JSON 结构表示将发送到 Web API 的有效负载。This JSON structure represents the payload that will be sent to your Web API. 它始终遵循以下约束:It will always follow these constraints:
- 顶级实体名为
values
,并且是对象数组。The top-level entity is calledvalues
and will be an array of objects. 此类对象的数量最多为batchSize
The number of such objects will be at most thebatchSize
values
数组中的每个对象都有Each object in thevalues
array will haverecordId
属性,用于标识相应记录的唯一字符串。ArecordId
property that is a unique string, used to identify that record.data
属性,它是 JSON 对象。Adata
property that is a JSON object.data
属性的字段对应于技能定义的inputs
部分中指定的“名称”。The fields of thedata
property will correspond to the "names" specified in theinputs
section of the skill definition. 这些字段的值将来自这些字段的source
(可能来自文档中的字段,也可能来自另一个技能)The value of those fields will be from thesource
of those fields (which could be from a field in the document, or potentially from another skill)
{
"values": [
{
"recordId": "0",
"data":
{
"text": "Este es un contrato en Inglés",
"language": "es",
"phraseList": ["Este", "Inglés"]
}
},
{
"recordId": "1",
"data":
{
"text": "Hello world",
"language": "en",
"phraseList": ["Hi"]
}
},
{
"recordId": "2",
"data":
{
"text": "Hello world, Hi world",
"language": "en",
"phraseList": ["world"]
}
},
{
"recordId": "3",
"data":
{
"text": "Test",
"language": "es",
"phraseList": []
}
}
]
}
示例输出 JSON 结构Sample output JSON structure
“输出”对应于 Web API 返回的响应。The "output" corresponds to the response returned from your Web API. Web API 应仅返回 JSON 有效负载(通过查看 Content-Type
响应头进行验证),并且应遵循以下约束:The Web API should only return a JSON payload (verified by looking at the Content-Type
response header) and should satisfy the following constraints:
- 应有名为
values
且是对象数组的顶级实体。There should be a top-level entity calledvalues
which should be an array of objects. - 数组中的对象数量应与发送到 Web API 的对象数量相同。The number of objects in the array should be the same as the number of objects sent to the Web API.
- 每个对象都应有:Each object should have:
recordId
属性ArecordId
propertydata
属性,这个对象中的字段是与output
中“名称”匹配的扩充,且值被视为扩充。Adata
property, which is an object where the fields are enrichments matching the "names" in theoutput
and whose value is considered the enrichment.errors
属性,列出将添加到索引器执行历史记录的任何错误的数组。Anerrors
property, an array listing any errors encountered that will be added to the indexer execution history. 此属性是必需的,但可以有null
值。This property is required, but can have anull
value.warnings
属性,列出将添加到索引器执行历史记录的任何警告的数组。Awarnings
property, an array listing any warnings encountered that will be added to the indexer execution history. 此属性是必需的,但可以有null
值。This property is required, but can have anull
value.
values
数组中的对象顺序不一定要与values
数组中作为请求发送到 Web API 的对象顺序相同。The objects in thevalues
array need not be in the same order as the objects in thevalues
array sent as a request to the Web API. 不过,由于recordId
用于关联,因此响应中任何包含recordId
(不属于向 Web API 发送的原始请求)的记录都会遭放弃。However, therecordId
is used for correlation so any record in the response containing arecordId
which was not part of the original request to the Web API will be discarded.
{
"values": [
{
"recordId": "3",
"data": {
},
"errors": [
{
"message" : "'phraseList' should not be null or empty"
}
],
"warnings": null
},
{
"recordId": "2",
"data": {
"hitPositions": [6, 16]
},
"errors": null,
"warnings": null
},
{
"recordId": "0",
"data": {
"hitPositions": [0, 23]
},
"errors": null,
"warnings": null
},
{
"recordId": "1",
"data": {
"hitPositions": []
},
"errors": null,
"warnings": {
"message": "No occurrences of 'Hi' were found in the input text"
}
},
]
}
错误案例Error cases
除了 Web API 不可用或发送出非成功状态代码以外,还会将以下情况视为出错:In addition to your Web API being unavailable, or sending out non-successful status codes the following are considered erroneous cases:
- 如果 Web API 返回成功状态代码,但响应指明它不是
application/json
,那么响应会被视为无效,并且不会执行任何扩充。If the Web API returns a success status code but the response indicates that it is notapplication/json
then the response is considered invalid and no enrichments will be performed. - 如果响应
values
数组中有无效记录(包含不属于原始请求的recordId
,或值重复),则不会对这些记录执行任何扩充。If there are invalid (withrecordId
not in the original request, or with duplicate values) records in the responsevalues
array, no enrichment will be performed for those records.
在 Web API 不可用或返回 HTTP 错误的情况下,包含 HTTP 错误的任何可用详细信息的易记错误都会添加到索引器执行历史记录。For cases when the Web API is unavailable or returns a HTTP error, a friendly error with any available details about the HTTP error will be added to the indexer execution history.