在 Azure 认知搜索扩充管道中自定义 Web API 技能。Custom Web API skill in an Azure Cognitive Search enrichment pipeline

借助自定义 Web API 技能,可以通过调用提供自定义操作的 Web API 终结点来扩展 AI 扩充。The Custom Web API skill allows you to extend AI enrichment by calling out to a Web API endpoint providing custom operations. 与内置技能类似,“自定义 Web API” 技能也有输入和输出。Similar to built-in skills, a Custom Web API skill has inputs and outputs. Web API 根据输入在索引器运行时接收 JSON 有效负载,并输出 JSON 有效负载作为响应,以及成功状态代码。Depending on the inputs, your Web API receives a JSON payload when the indexer runs, and outputs a JSON payload as a response, along with a success status code. 响应应包含自定义技能指定的输出。The response is expected to have the outputs specified by your custom skill. 其他任何响应都被视为错误,并且不会执行任何扩充。Any other response is considered an error and no enrichments are performed.

本文档进一步详细介绍了 JSON 有效负载的结构。The structure of the JSON payloads are described further down in this document.

备注

索引器会对 Web API 返回的某些标准 HTTP 状态代码重试两次。The indexer will retry twice for certain standard HTTP status codes returned from the Web API. 这些 HTTP 状态代码为:These HTTP status codes are:

  • 502 Bad Gateway
  • 503 Service Unavailable
  • 429 Too Many Requests

@odata.type

Microsoft.Skills.Custom.WebApiSkillMicrosoft.Skills.Custom.WebApiSkill

技能参数Skill parameters

参数区分大小写。Parameters are case-sensitive.

参数名称Parameter name 说明Description
uri JSON 有效负载发送到的 Web API 的 URI。The URI of the Web API to which the JSON payload will be sent. 只允许使用 https URI 方案Only https URI scheme is allowed
httpMethod 发送有效负载时使用的方法。The method to use while sending the payload. 允许使用的方法为 PUTPOSTAllowed methods are PUT or POST
httpHeaders 键值对集合,其中键表示头名称,值表示发送到 Web API 的头值以及有效负载。A collection of key-value pairs where the keys represent header names and values represent header values that will be sent to your Web API along with the payload. 此集合中禁止使用以下头:AcceptAccept-CharsetAccept-EncodingContent-LengthContent-TypeCookieHostTEUpgradeViaThe following headers are prohibited from being in this collection: Accept, Accept-Charset, Accept-Encoding, Content-Length, Content-Type, Cookie, Host, TE, Upgrade, Via
timeout (可选)如果指定,表明执行 API 调用的 http 客户端的超时值。(Optional) When specified, indicates the timeout for the http client making the API call. 必须将其格式化为 XSD“dayTimeDuration”值(ISO 8601 持续时间值的受限子集)。It must be formatted as an XSD "dayTimeDuration" value (a restricted subset of an ISO 8601 duration value). 例如,PT60S 表示 60 秒。For example, PT60S for 60 seconds. 如果未设置,选择的是默认值 30 秒。If not set, a default value of 30 seconds is chosen. 超时可以设置为最大 230 秒和最小 1 秒。The timeout can be set to a maximum of 230 seconds and a minimum of 1 second.
batchSize (可选)表示每 API 调用发送多少个“数据记录”(请参阅下面的 JSON 有效负载结构)。(Optional) Indicates how many "data records" (see JSON payload structure below) will be sent per API call. 如果未设置,选择的是默认值 1000。If not set, a default of 1000 is chosen. 建议使用此参数在索引编制吞吐量和 API 负载之间进行适当取舍We recommend that you make use of this parameter to achieve a suitable tradeoff between indexing throughput and load on your API
degreeOfParallelism (可选)如果指定了此值,则指示索引器将对你提供的终结点进行的并行调用数。(Optional) When specified, indicates the number of calls the indexer will make in parallel to the endpoint you have provided. 如果终结点在请求负载过高的情况下失败,则可以减小此值;如果终结点能够接受更多请求并且希望提高索引器的性能,则可以增大此值。You can decrease this value if your endpoint is failing under too high of a request load, or raise it if your endpoint is able to accept more requests and you would like an increase in the performance of the indexer. 如果未设置,则将使用默认值 5。If not set, a default value of 5 is used. 可以为 degreeOfParallelism 设置的最大值为 10,最小值为 1。The degreeOfParallelism can be set to a maximum of 10 and a minimum of 1.

技能输入Skill inputs

此技能没有“预定义”输入。There are no "predefined" inputs for this skill. 可以选择在执行此技能时已提供的一个或多个字段作为输入,发送到 Web API 的 JSON 有效负载将会有不同的字段。You can choose one or more fields that would be already available at the time of this skill's execution as inputs and the JSON payload sent to the Web API will have different fields.

技能输出Skill outputs

此技能没有“预定义”输出。There are no "predefined" outputs for this skill. 根据 Web API 将返回的响应,添加输出字段,以便能够从 JSON 响应中选择这些字段。Depending on the response your Web API will return, add output fields so that they can be picked up from the JSON response.

示例定义Sample definition

  {
        "@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
        "description": "A custom skill that can identify positions of different phrases in the source text",
        "uri": "https://contoso.count-things.com",
        "batchSize": 4,
        "context": "/document",
        "inputs": [
          {
            "name": "text",
            "source": "/document/content"
          },
          {
            "name": "language",
            "source": "/document/languageCode"
          },
          {
            "name": "phraseList",
            "source": "/document/keyphrases"
          }
        ],
        "outputs": [
          {
            "name": "hitPositions"
          }
        ]
      }

示例输入 JSON 结构Sample input JSON structure

此 JSON 结构表示将发送到 Web API 的有效负载。This JSON structure represents the payload that will be sent to your Web API. 它始终遵循以下约束:It will always follow these constraints:

  • 顶级实体名为 values,并且是对象数组。The top-level entity is called values and will be an array of objects. 此类对象的数量最多为 batchSizeThe number of such objects will be at most the batchSize
  • values 数组中的每个对象都有Each object in the values array will have
    • recordId 属性,用于标识相应记录的唯一字符串。A recordId property that is a unique string, used to identify that record.
    • data 属性,它是 JSON 对象。A data property that is a JSON object. data 属性的字段对应于技能定义的 inputs 部分中指定的“名称”。The fields of the data property will correspond to the "names" specified in the inputs section of the skill definition. 这些字段的值将来自这些字段的 source(可能来自文档中的字段,也可能来自另一个技能)The value of those fields will be from the source of those fields (which could be from a field in the document, or potentially from another skill)
{
    "values": [
      {
        "recordId": "0",
        "data":
           {
             "text": "Este es un contrato en Inglés",
             "language": "es",
             "phraseList": ["Este", "Inglés"]
           }
      },
      {
        "recordId": "1",
        "data":
           {
             "text": "Hello world",
             "language": "en",
             "phraseList": ["Hi"]
           }
      },
      {
        "recordId": "2",
        "data":
           {
             "text": "Hello world, Hi world",
             "language": "en",
             "phraseList": ["world"]
           }
      },
      {
        "recordId": "3",
        "data":
           {
             "text": "Test",
             "language": "es",
             "phraseList": []
           }
      }
    ]
}

示例输出 JSON 结构Sample output JSON structure

“输出”对应于 Web API 返回的响应。The "output" corresponds to the response returned from your Web API. Web API 应仅返回 JSON 有效负载(通过查看 Content-Type 响应头进行验证),并且应遵循以下约束:The Web API should only return a JSON payload (verified by looking at the Content-Type response header) and should satisfy the following constraints:

  • 应有名为 values 且是对象数组的顶级实体。There should be a top-level entity called values which should be an array of objects.
  • 数组中的对象数量应与发送到 Web API 的对象数量相同。The number of objects in the array should be the same as the number of objects sent to the Web API.
  • 每个对象都应有:Each object should have:
    • recordId 属性A recordId property
    • data 属性,这个对象中的字段是与 output 中“名称”匹配的扩充,且值被视为扩充。A data property, which is an object where the fields are enrichments matching the "names" in the output and whose value is considered the enrichment.
    • errors 属性,列出将添加到索引器执行历史记录的任何错误的数组。An errors property, an array listing any errors encountered that will be added to the indexer execution history. 此属性是必需的,但可以有 null 值。This property is required, but can have a null value.
    • warnings 属性,列出将添加到索引器执行历史记录的任何警告的数组。A warnings property, an array listing any warnings encountered that will be added to the indexer execution history. 此属性是必需的,但可以有 null 值。This property is required, but can have a null value.
  • values 数组中的对象顺序不一定要与 values 数组中作为请求发送到 Web API 的对象顺序相同。The objects in the values array need not be in the same order as the objects in the values array sent as a request to the Web API. 不过,由于 recordId 用于关联,因此响应中任何包含 recordId(不属于向 Web API 发送的原始请求)的记录都会遭放弃。However, the recordId is used for correlation so any record in the response containing a recordId which was not part of the original request to the Web API will be discarded.
{
    "values": [
        {
            "recordId": "3",
            "data": {
            },
            "errors": [
              {
                "message" : "'phraseList' should not be null or empty"
              }
            ],
            "warnings": null
        },
        {
            "recordId": "2",
            "data": {
                "hitPositions": [6, 16]
            },
            "errors": null,
            "warnings": null
        },
        {
            "recordId": "0",
            "data": {
                "hitPositions": [0, 23]
            },
            "errors": null,
            "warnings": null
        },
        {
            "recordId": "1",
            "data": {
                "hitPositions": []
            },
            "errors": null,
            "warnings": {
                "message": "No occurrences of 'Hi' were found in the input text"
            }
        },
    ]
}

错误案例Error cases

除了 Web API 不可用或发送出非成功状态代码以外,还会将以下情况视为出错:In addition to your Web API being unavailable, or sending out non-successful status codes the following are considered erroneous cases:

  • 如果 Web API 返回成功状态代码,但响应指明它不是 application/json,那么响应会被视为无效,并且不会执行任何扩充。If the Web API returns a success status code but the response indicates that it is not application/json then the response is considered invalid and no enrichments will be performed.
  • 如果响应 values 数组中有无效记录(包含不属于原始请求的 recordId,或值重复),则不会对这些记录执行任何扩充。If there are invalid (with recordId not in the original request, or with duplicate values) records in the response values array, no enrichment will be performed for those records.

在 Web API 不可用或返回 HTTP 错误的情况下,包含 HTTP 错误的任何可用详细信息的易记错误都会添加到索引器执行历史记录。For cases when the Web API is unavailable or returns a HTTP error, a friendly error with any available details about the HTTP error will be added to the indexer execution history.

另请参阅See also