排查 Azure 认知搜索中的常见索引器错误和警告Troubleshooting common indexer errors and warnings in Azure Cognitive Search

本文提供有关在 Azure 认知搜索中进行索引编制与 AI 扩充期间可能会遇到的常见错误和警告的信息及其解决方法。This article provides information and solutions to common errors and warnings you might encounter during indexing and AI enrichment in Azure Cognitive Search.

当错误计数超过 maxFailedItems 时,索引编制操作停止。Indexing stops when the error count exceeds 'maxFailedItems'.

如果你希望索引器忽略这些错误(并跳过“失败的文档”),请考虑按此处所述更新 maxFailedItemsmaxFailedItemsPerBatchIf you want indexers to ignore these errors (and skip over "failed documents"), consider updating the maxFailedItems and maxFailedItemsPerBatch as described here.

备注

每个失败的文档及其文档键(如果有)将在索引器执行状态中显示为错误。Each failed document along with its document key (when available) will show up as an error in the indexer execution status. 如果已将索引器设置为容错,则以后可以使用索引 API 手动上传文档。You can utilize the index api to manually upload the documents at a later point if you have set the indexer to tolerate failures.

本文中的错误信息可帮助你解决错误,使索引编制能够继续。The error information in this article can help you resolve errors, allowing indexing to continue.

出现警告时不会停止索引编制,但它们确实表示出现了可能导致意外结果的状况。Warnings do not stop indexing, but they do indicate conditions that could result in unexpected outcomes. 是否采取措施取决于具体的数据和场景。Whether you take action or not depends on the data and your scenario.

从 API 版本 2019-05-06 开始,将会构建项级索引器错误和警告,使原因和后续措施变得更明朗。Beginning with API version 2019-05-06, item-level Indexer errors and warnings are structured to provide increased clarity around causes and next steps. 这些信息包含以下属性:They contain the following properties:

属性Property 说明Description 示例Example
keykey 受错误或警告影响的文档的文档 ID。The document ID of the document impacted by the error or warning. https://coromsearch.blob.core.chinacloudapi.cn/jfk-1k/docid-32112954.pdfhttps://coromsearch.blob.core.chinacloudapi.cn/jfk-1k/docid-32112954.pdf
namename 操作名称,描述发生错误或警告的位置。The operation name describing where the error or warning occurred. 此属性由以下结构生成的:[category].[subcategory].[resourceType].[resourceName]This is generated by the following structure: [category].[subcategory].[resourceType].[resourceName] DocumentExtraction.azureblob.myBlobContainerName Enrichment.WebApiSkill.mySkillName Projection.SearchIndex.OutputFieldMapping.myOutputFieldName Projection.SearchIndex.MergeOrUpload.myIndexName Projection.KnowledgeStore.Table.myTableNameDocumentExtraction.azureblob.myBlobContainerName Enrichment.WebApiSkill.mySkillName Projection.SearchIndex.OutputFieldMapping.myOutputFieldName Projection.SearchIndex.MergeOrUpload.myIndexName Projection.KnowledgeStore.Table.myTableName
messagemessage 错误或警告的概要说明。A high-level description of the error or warning. 由于 Web API 请求失败,无法执行技能。Could not execute skill because the Web Api request failed.
详细信息details 可能有助于诊断问题的任何其他详细信息,例如,执行自定义技能失败时的 Web API 响应。Any additional details which may be helpful to diagnose the issue, such as the WebApi response if executing a custom skill failed. link-cryptonyms-list - Error processing the request record : System.ArgumentNullException: Value cannot be null. Parameter name: source at System.Linq.Enumerable.All[TSource](IEnumerable1 source, Func2 predicate) at Microsoft.CognitiveSearch.WebApiSkills.JfkWebApiSkills. ...rest of stack trace...link-cryptonyms-list - Error processing the request record : System.ArgumentNullException: Value cannot be null. Parameter name: source at System.Linq.Enumerable.All[TSource](IEnumerable1 source, Func2 predicate) at Microsoft.CognitiveSearch.WebApiSkills.JfkWebApiSkills. ...rest of stack trace...
documentationLinkdocumentationLink 相关文档的链接,该文档包含用于调试和解决问题的详细信息。A link to relevant documentation with detailed information to debug and resolve the issue. 此链接通常指向本页的以下部分之一。This link will often point to one of the below sections on this page. https://docs.azure.cn/search/cognitive-search-common-errors-warnings#could-not-execute-skill-because-the-web-api-request-failed

错误:无法读取文档Error: Could not read document

索引器无法从数据源中读取文档。Indexer was unable to read the document from the data source. 此错误的可能原因包括:This can happen due to:

ReasonReason 详细信息/示例Details/Example 解决方法Resolution
不同文档中的字段类型不一致Inconsistent field types across different documents “值的类型与列类型不匹配。"Type of value has a mismatch with column type. 无法将 '{47.6,-122.1}' 存储在 authors 列中。Couldn't store '{47.6,-122.1}' in authors column. 预期的类型为 JArray。”Expected type is JArray." “从数据类型 nvarchar 转换为 float 时出错。”"Error converting data type nvarchar to float." “将 nvarchar 值 '12 months' 转换为数据类型 int 时转换失败。”"Conversion failed when converting the nvarchar value '12 months' to data type int." “将表达式转换为数据类型 int 时发生算术溢出错误。”"Arithmetic overflow error converting expression to data type int." 确保不同文档中每个字段的类型相同。Ensure that the type of each field is the same across different documents. 例如,如果第一个文档的 'startTime' 字段是日期时间,而在第二个文档中,该字段是字符串,则就会出现此错误。For example, if the first document 'startTime' field is a DateTime, and in the second document it's a string, this error will be hit.
数据源的底层服务发生的错误errors from the data source's underlying service (来自 Cosmos DB){"Errors":["Request rate is large"]}(from Cosmos DB) {"Errors":["Request rate is large"]} 检查存储实例,确保其正常运行。Check your storage instance to ensure it's healthy. 可能需要调整缩放/分区。You may need to adjust your scaling/partitioning.
暂时性问题transient issues 在接收来自服务器的结果时发生传输级错误。A transport-level error has occurred when receiving results from the server. (提供程序:TCP 提供程序,错误:0 - 远程主机强行关闭了现有连接(provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host 偶尔出现意外的连接问题。Occasionally there are unexpected connectivity issues. 稍后再次尝试通过索引器运行文档。Try running the document through your indexer again later.

错误:无法从文档中提取内容或元数据Error: Could not extract content or metadata from your document

具有 Blob 数据源的索引器无法从文档(例如 PDF 文件)中提取内容或元数据。Indexer with a Blob data source was unable to extract the content or metadata from the document (for example, a PDF file). 此错误的可能原因包括:This can happen due to:

ReasonReason 详细信息/示例Details/Example 解决方法Resolution
Blob 超过大小限制blob is over the size limit 文档大小为 '150441598' 字节,这超过了当前服务层级支持的最大文档提取大小('134217728' 字节)。Document is '150441598' bytes, which exceeds the maximum size '134217728' bytes for document extraction for your current service tier. Blob 索引错误blob indexing errors
Blob 采用了不受支持的内容类型blob has unsupported content type 文档采用了不受支持的内容类型 'image/png'Document has unsupported content type 'image/png' Blob 索引错误blob indexing errors
Blob 已加密blob is encrypted 无法处理文档 - 它可能已加密或者受密码保护。Document could not be processed - it may be encrypted or password protected. 可以使用 Blob 设置跳过 Blob。You can skip the blob with blob settings.
暂时性问题transient issues “处理 Blob 时出错:请求已中止:请求已被取消。”"Error processing blob: The request was aborted: The request was canceled." “在处理期间文档超时。”"Document timed out during processing." 偶尔出现意外的连接问题。Occasionally there are unexpected connectivity issues. 稍后再次尝试通过索引器运行文档。Try running the document through your indexer again later.

错误:无法分析文档Error: Could not parse document

索引器从数据源中读取了文档,但在将文档内容转换为指定的字段映射架构时出现了问题。Indexer read the document from the data source, but there was an issue converting the document content into the specified field mapping schema. 此错误的可能原因包括:This can happen due to:

ReasonReason 详细信息/示例Details/Example 解决方法Resolution
缺少文档键The document key is missing 文档键不能缺失或为空Document key cannot be missing or empty 确保所有文档具有有效的文档键。Ensure all documents have valid document keys. 文档键是通过将“键”属性设置为索引定义的一部分来确定的。The document key is determined by setting the 'key' property as part of the index definition. 当在特定文档上找不到标记为“键”的属性时,索引器将发出此错误。Indexers will emit this error when the property flagged as the 'key' cannot be found on a particular document.
文档键无效The document key is invalid 文档键的长度不能超过 1024 个字符Document key cannot be longer than 1024 characters 根据验证要求修改文档键。Modify the document key to meet the validation requirements.
无法将字段映射应用到某个字段Could not apply field mapping to a field 无法将映射函数 'functionName' 应用到字段 'fieldName'Could not apply mapping function 'functionName' to field 'fieldName'. 数组不能为 null。Array cannot be null. 参数名称: bytesParameter name: bytes 请反复检查索引器中定义的字段映射,并与失败文档的指定字段的数据进行比较。Double check the field mappings defined on the indexer, and compare with the data of the specified field of the failed document. 可能需要修改字段映射或文档数据。It may be necessary to modify the field mappings or the document data.
无法读取字段值Could not read field value 无法读取列 'fieldName' 在索引 'fieldIndex' 处的值。Could not read the value of column 'fieldName' at index 'fieldIndex'. 在接收来自服务器的结果时发生传输级错误。A transport-level error has occurred when receiving results from the server. (提供程序:TCP 提供程序,错误:0 - 远程主机强行关闭了现有连接。(provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.) 这些错误的常见原因是数据源的底层服务出现了意外的连接问题。These errors are typically due to unexpected connectivity issues with the data source's underlying service. 稍后再次尝试通过索引器运行文档。Try running the document through your indexer again later.

错误:应用映射函数“abc”时,由于反序列化问题,无法将输出字段“xyz”映射到搜索索引Error: Could not map output field 'xyz' to search index due to deserialization problem while applying mapping function 'abc'

输出映射可能已失败,因为输出数据的格式与你使用的映射函数不兼容。The output mapping might have failed because the output data is in the wrong format for the mapping function you are using. 例如,对二进制数据应用 Base64Encode 映射函数就会生成此错误。For example, applying Base64Encode mapping function on binary data would generate this error. 若要解决此问题,请重新运行索引器而不指定映射函数,或者确保映射函数与输出字段的数据类型兼容。To resolve the issue, either rerun indexer without specifying mapping function or ensure that the mapping function is compatible with the output field data type. 有关详细信息,请参阅输出字段映射See Output field mapping for details.

错误:无法执行技能Error: Could not execute skill

索引器无法运行技能集中的某个技能。Indexer was not able to run a skill in the skillset.

ReasonReason 详细信息/示例Details/Example 解决方法Resolution
暂时性连接问题Transient connectivity issues 发生了暂时性错误。A transient error occurred. 请稍后重试。Please try again later. 偶尔出现意外的连接问题。Occasionally there are unexpected connectivity issues. 稍后再次尝试通过索引器运行文档。Try running the document through your indexer again later.
潜在的产品 bugPotential product bug 发生了意外错误。An unexpected error occurred. 这表示发生了未知类别的失败,也可能表示产品有 bug。This indicates an unknown class of failure and may mean there is a product bug. 若要获得帮助,请提交支持票证Please file a support ticket to get help.
技能在执行期间遇到错误A skill has encountered an error during execution (来自合并技能)一个或多个偏移量值无效,无法对其进行分析。(From Merge Skill) One or more offset values were invalid and could not be parsed. 项已插入到文本的末尾Items were inserted at the end of the text 使用错误消息中的信息来解决问题。Use the information in the error message to fix the issue. 此类失败需要采取措施才能解决。This kind of failure will require action to resolve.

错误:由于 Web API 请求失败,无法执行技能Error: Could not execute skill because the Web API request failed

由于对 Web API 的调用失败,未能执行技能。Skill execution failed because the call to the Web API failed. 通常,此类失败是在使用自定义技能时发生的,在这种情况下,需要调试自定义代码才能解决问题。Typically, this class of failure occurs when custom skills are used, in which case you will need to debug your custom code to resolve the issue. 如果失败来源于某个内置技能,请参考错误消息获得解决问题的帮助。If instead the failure is from a built-in skill, refer to the error message for help in fixing the issue.

在调试此问题时,请务必注意此技能的所有技能输入警告While debugging this issue, be sure to pay attention to any skill input warnings for this skill. 你的 Web API 终结点可能会失败,因为索引器正在向它传递意外的输入。Your Web API endpoint may be failing because the indexer is passing it unexpected input.

错误:由于 Web API 技能响应无效,无法执行技能Error: Could not execute skill because Web API skill response is invalid

由于对 Web API 的调用返回了无效的响应,技能执行失败。Skill execution failed because the call to the Web API returned an invalid response. 通常,此类失败是在使用自定义技能时发生的,在这种情况下,需要调试自定义代码才能解决问题。Typically, this class of failure occurs when custom skills are used, in which case you will need to debug your custom code to resolve the issue. 如果失败来源于内置技能,请提交支持票证以获得帮助。If instead the failure is from a built-in skill, please file a support ticket to get assistance.

错误:在时间限制内未执行技能Error: Skill did not execute within the time limit

在两种情况下可能会出现此错误消息,每种情况应以不同的方式进行处理。There are two cases under which you may encounter this error message, each of which should be treated differently. 请根据返回此错误的技能,按照以下说明予以解决。Please follow the instructions below depending on what skill returned this error for you.

内置认知服务技能Built-in Cognitive Service skills

许多内置认知技能(例如语言检测、实体识别或 OCR)由认知服务 API 终结点提供支持。Many of the built-in cognitive skills, such as language detection, entity recognition, or OCR, are backed by a Cognitive Service API endpoint. 有时,这些终结点会出现暂时性问题,因而请求超时。对于暂时性的问题,除了等待再重试以外,没有其他补救措施。Sometimes there are transient issues with these endpoints and a request will time out. For transient issues, there is no remedy except to wait and try again. 作为缓解措施,请考虑将索引器设置为按计划运行As a mitigation, consider setting your indexer to run on a schedule. 计划的索引编制将从中断的位置继续。Scheduled indexing picks up where it left off. 假设解决了暂时性问题,在下一次计划的运行时,索引编制和认知技能处理应可继续进行。Assuming transient issues are resolved, indexing and cognitive skill processing should be able to continue on the next scheduled run.

如果内置认知技能的同一文档仍然出现此错误,请提交支持票证以获得帮助,因为此错误不是预期的。If you continue to see this error on the same document for a built-in cognitive skill, please file a support ticket to get assistance, as this is not expected.

自定义技能Custom skills

如果创建的自定义技能出现超时错误,可以尝试多种解决方法。If you encounter a timeout error with a custom skill you have created, there are a couple of things you can try. 首先,检查该自定义技能,确保它未陷入无限循环,并且可持续返回结果。First, review your custom skill and ensure that it is not getting stuck in an infinite loop and that it is returning a result consistently. 确认这是问题所在后,确定技能的执行时间。Once you have confirmed that is the case, determine what the execution time of your skill is. 如果未在自定义技能定义中显式设置 timeout 值,则默认的 timeout 为 30 秒。If you didn't explicitly set a timeout value on your custom skill definition, then the default timeout is 30 seconds. 如果 30 秒不足以完成技能的执行,可以在自定义技能定义中指定一个更高的 timeout 值。If 30 seconds is not long enough for your skill to execute, you may specify a higher timeout value on your custom skill definition. 下面是将超时设置为 90 秒的自定义技能定义示例:Here is an example of a custom skill definition where the timeout is set to 90 seconds:

  {
        "@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
        "uri": "<your custom skill uri>",
        "batchSize": 1,
        "timeout": "PT90S",
        "context": "/document",
        "inputs": [
          {
            "name": "input",
            "source": "/document/content"
          }
        ],
        "outputs": [
          {
            "name": "output",
            "targetName": "output"
          }
        ]
      }

可为 timeout 参数设置的最大值为 230 秒。The maximum value that you can set for the timeout parameter is 230 seconds. 如果自定义技能无法在 230 秒内以一致的方式执行,你可以考虑减小自定义技能的 batchSize,以减少它在单次执行中处理的文档数。If your custom skill is unable to execute consistently within 230 seconds, you may consider reducing the batchSize of your custom skill so that it will have fewer documents to process within a single execution. 如果已将 batchSize 设置为 1,则需要重写技能,使其能够在 230 秒内完成执行;或者将其拆分为多个自定义技能,使任何一个自定义技能的最长执行时间为 230 秒。If you have already set your batchSize to 1, you will need to rewrite the skill to be able to execute in under 230 seconds or otherwise split it into multiple custom skills so that the execution time for any single custom skill is a maximum of 230 seconds. 有关详细信息,请查看自定义技能文档Review the custom skill documentation for more information.

错误:无法在搜索索引中“MergeOrUpload”|“Delete”文档Error: Could not 'MergeOrUpload' | 'Delete' document to the search index

已读取并处理文档,但索引器无法将其添加到搜索索引。The document was read and processed, but the indexer could not add it to the search index. 此错误的可能原因包括:This can happen due to:

ReasonReason 详细信息/示例Details/Example 解决方法Resolution
某个字段包含太大的字词A field contains a term that is too large 文档中的字词超过了 32 KB 的限制A term in your document is larger than the 32 KB limit 确保字段未配置为可筛选、可分面或可排序,即可规避此限制。You can avoid this restriction by ensuring the field is not configured as filterable, facetable, or sortable.
文档太大,无法为其编制索引Document is too large to be indexed 文档超过了最大 API 请求大小A document is larger than the maximum api request size 如何为大型数据集编制索引How to index large data sets
文档的集合中包含太多的对象Document contains too many objects in collection 文档中的某个集合超过了所有复杂集合的最大元素数目限制。“键为 '1000052' 的文档的集合(JSON 数组)包含 '4303' 个对象。A collection in your document exceeds the maximum elements across all complex collections limit "The document with key '1000052' has '4303' objects in collections (JSON arrays). 整个文档中的集合最多允许 '3000' 个对象。At most '3000' objects are allowed to be in collections across the entire document. 请从集合中删除对象,然后重试为该文档编制索引。”Please remove objects from collections and try indexing the document again." 我们建议将文档中复杂集合的大小减至限制以下,并避免较高的存储利用率。We recommend reducing the size of the complex collection in the document to below the limit and avoid high storage utilization.
由于服务正在承受其他负载(例如查询或索引编制),连接到目标索引时出现问题(重试后仍会出现)。Trouble connecting to the target index (that persists after retries) because the service is under other load, such as querying or indexing. 未能建立连接,因此无法更新索引。Failed to establish connection to update index. 搜索服务的负载过重。Search service is under heavy load. 扩展搜索服务Scale up your search service
搜索服务正在根据服务更新进行修补,或者处于拓扑重新配置过程的中途。Search service is being patched for service update, or is in the middle of a topology reconfiguration. 未能建立连接,因此无法更新索引。Failed to establish connection to update index. 搜索服务当前已关闭/搜索服务正在过渡。Search service is currently down/Search service is undergoing a transition. 根据 SLA 文档,为服务至少配置 3 个可用性为 99.9% 的副本Configure service with at least 3 replicas for 99.9% availability per SLA documentation
底层计算/网络资源发生故障(罕见情况)Failure in the underlying compute/networking resource (rare) 未能建立连接,因此无法更新索引。Failed to establish connection to update index. 发生未知的失败。An unknown failure occurred. 将索引器配置为按计划运行,以从失败状态继续工作。Configure indexers to run on a schedule to pick up from a failed state.
由于出现网络问题,在超时期限内未确认对目标索引发出的索引编制请求。An indexing request made to the target index was not acknowledged within a timeout period due to network issues. 无法及时与搜索索引建立连接。Could not establish connection to the search index in a timely manner. 将索引器配置为按计划运行,以从失败状态继续工作。Configure indexers to run on a schedule to pick up from a failed state. 此外,如果此错误持续出现,请尝试减小索引器的批大小Additionally, try lowering the indexer batch size if this error condition persists.

错误:由于文档的某些数据无效,无法为文档编制索引Error: Could not index document because some of the document's data was not valid

文档已由索引器读取并处理,但由于索引字段的配置与索引器提取并处理的数据不匹配,无法将文档添加到搜索索引。The document was read and processed by the indexer, but due to a mismatch in the configuration of the index fields and the data extracted and processed by the indexer, it could not be added to the search index. 此错误的可能原因包括:This can happen due to:

ReasonReason 详细信息/示例Details/Example
索引器提取的字段的数据类型与相应目标索引字段的数据模型不兼容。Data type of the field(s) extracted by the indexer is incompatible with the data model of the corresponding target index field. 键为“888”的文档中的数据字段“data”包含“Edm.String”类型的无效值。The data field 'data' in the document with key '888' has an invalid value 'of type 'Edm.String''. 预期类型为“Collection(Edm.String)”。The expected type was 'Collection(Edm.String)'.
未能从字符串值中提取任何 JSON 实体。Failed to extract any JSON entity from a string value. 无法将字段“data”的“Edm.String”类型值分析为 JSON 对象。Could not parse value 'of type 'Edm.String'' of field 'data' as a JSON object. 错误:“分析某个值后遇到意外的字符: ''。Error:'After parsing a value an unexpected character was encountered: ''. 路径‘path’,行 1,位置 3162。”Path 'path', line 1, position 3162.'
未能从字符串值中提取 JSON 实体的集合。Failed to extract a collection of JSON entities from a string value. 无法将字段“data”的“Edm.String”类型值分析为 JSON 数组。Could not parse value 'of type 'Edm.String'' of field 'data' as a JSON array. 错误:“分析某个值后遇到意外的字符: ''。Error:'After parsing a value an unexpected character was encountered: ''. 路径‘[0]’,行 1,位置 27。”Path '[0]', line 1, position 27.'
在源文档中发现了未知类型。An unknown type was discovered in the source document. 无法为未知类型“unknown”编制索引Unknown type 'unknown' cannot be indexed
源文档中对地理位置点使用了不兼容的表示法。An incompatible notation for geography points was used in the source document. 不支持 WKT 点字符串文本。WKT POINT string literals are not supported. 请改用 GeoJson 点文本Please use GeoJson point literals instead

对于所有这些情况,请参阅支持的数据类型索引器的数据类型映射,确保正确生成索引架构,并设置适当的索引器字段映射In all these cases, refer to Supported Data types and Data type map for indexers to make sure that you build the index schema correctly and have set up appropriate indexer field mappings. 错误消息中的详细信息可帮助跟踪不匹配问题的起因。The error message will include details that can help track down the source of the mismatch.

错误:由于表采用了组合主键,无法使用集成的更改跟踪策略Error: Integrated change tracking policy cannot be used because table has a composite primary key

这适用于 SQL 表,此错误通常发生在将键定义为组合键,或者在表定义了唯一聚集索引时(在 SQL 索引而不是 Azure 搜索索引中)。This applies to SQL tables, and usually happens when the key is either defined as a composite key or, when the table has defined a unique clustered index (as in a SQL index, not an Azure Search index). 主要原因是在使用唯一聚集索引的情况下将键属性修改为组合主键。The main reason is that the key attribute is modified to be a composite primary key in the case of a unique clustered index. 在这种情况下,请确保 SQL 表不采用唯一聚集索引,或者,请将键字段映射到某个保证不包含重复值的字段。In that case, make sure that your SQL table does not have a unique clustered index, or that you map the key field to a field that is guaranteed not to have duplicate values.

错误:在索引器最长运行时间内无法处理文档Error: Could not process document within indexer max run time

当索引器无法在允许的执行时间内处理完数据源中的单个文档时,将发生此错误。This error occurs when the indexer is unable to finish processing a single document from the data source within the allowed execution time. 使用技能集时,最长运行时间更短。Maximum running time is shorter when skillsets are used. 发生此错误时,如果 maxFailedItems 设置为非 0 值,则索引器将在以后的运行中绕过该文档,使索引编制能够继续。When this error occurs, if you have maxFailedItems set to a value other than 0, the indexer bypasses the document on future runs so that indexing can progress. 如果无法跳过任何文档,或者此错误一直出现,请考虑将文档分解为较小的文档,以便在索引器的单次执行中处理能够取得部分进展。If you cannot afford to skip any document, or if you are seeing this error consistently, consider breaking documents into smaller documents so that partial progress can be made within a single indexer execution.

<a name="could-not-project-document>

错误:无法投影文档Error: Could not project document

如果索引器在尝试将项目数据投影到知识存储时发生失败,则会发生此错误。This error occurs when the indexer is attempting to project data into a knowledge store and there was an failure in our attempt to do so. 这种失败可能会持续出现但可修复;或者,可能是投影输出接收器的暂时性失败,在这种情况下,需要等待并重试才能解决这种失败。This failure could be consistent and fixable or it could be a transient failure with the projection output sink that you may need to wait and retry in order to resolve. 下面是一系列已知失败状态及其可能的解决方法。Here are a set of known failure states and possible resolutions.

ReasonReason 详细信息/示例Details/Example 解决方法Resolution
无法在容器 'containerName' 中更新投影 Blob 'blobUri'Could not update projection blob 'blobUri' in container 'containerName' 指定的容器不存在。The specified container does not exist. 索引器将检查是否已事先创建指定的容器,并会根据创建该容器,但在每个索引器运行周期,它只会执行此项检查一次。The indexer will check if the specified container has been previously created and will create it if necessary, but it only performs this check once per indexer run. 此错误表示在执行此步骤后某个操作删除了容器。This error means that something deleted the container after this step. 若要解决此错误,请尝试以下方法:忽略存储帐户信息,等待索引器完成,然后重新运行索引器。To resolve this error, try this: leave your storage account information alone, wait for the indexer to finish, and then rerun the indexer.
无法在容器 'containerName' 中更新投影 Blob 'blobUri'Could not update projection blob 'blobUri' in container 'containerName' 无法将数据写入传输连接:远程主机强行关闭了现有连接。Unable to write data to the transport connection: An existing connection was forcibly closed by the remote host. 这是预期的 Azure 存储暂时性失败,因此应该通过重新运行索引器来予以解决。This is expected to be a transient failure with Azure Storage and thus should be resolved by rerunning the indexer. 如果此错误持续出现,请提交支持票证,让我们进一步调查。If you encounter this error consistently, please file a support ticket so it can be investigated further.
无法更新表 'tableName' 中的行 'projectionRow'Could not update row 'projectionRow' in table 'tableName' 服务器繁忙。The server is busy. 这是预期的 Azure 存储暂时性失败,因此应该通过重新运行索引器来予以解决。This is expected to be a transient failure with Azure Storage and thus should be resolved by rerunning the indexer. 如果此错误持续出现,请提交支持票证,让我们进一步调查。If you encounter this error consistently, please file a support ticket so it can be investigated further.

警告:技能输入无效Warning: Skill input was invalid

技能的某项输入缺少、类型错误或无效。An input to the skill was missing, the wrong type, or otherwise invalid. 警告消息会指出影响:The warning message will indicate the impact:

  1. 无法执行技能Could not execute skill
  2. 技能已执行,但可能出现了意外的结果Skill executed but may have unexpected results

认知技能包含必需输入和可选输入。Cognitive skills have required inputs and optional inputs. 例如,关键短语提取技能包含两个必需输入 textlanguageCode,不包含可选输入。For example the Key phrase extraction skill has two required inputs text, languageCode, and no optional inputs. 自定义技能输入均被视为可选输入。Custom skill inputs are all considered optional inputs.

如果缺少任何必需输入,或者任何输入的类型不正确,则会跳过该技能并生成警告。If any required inputs are missing or if any input is not the right type, the skill gets skipped and generates a warning. 跳过的技能不会生成任何输出,因此,如果其他技能使用已跳过的技能的输出,则它们可能会生成其他警告。Skipped skills do not generate any outputs, so if other skills use outputs of the skipped skill they may generate additional warnings.

如果缺少可选输入,则技能仍会运行,但由于缺少输入,可能会生成意外的输出。If an optional input is missing, the skill will still run but may produce unexpected output due to the missing input.

在这两种情况下,由于数据的形状,此警告都可能是预期的。In both cases, this warning may be expected due to the shape of your data. 例如,如果某个文档的 firstNamemiddleNamelastName 字段中包含有关人员的信息,则某些文档可能没有 middleName 对应的条目。For example, if you have a document containing information about people with the fields firstName, middleName, and lastName, you may have some documents which do not have an entry for middleName. 如果在管道中将 middleName 作为输入传递给技能,则预期此项技术输入有时会缺失。If you to pass middleName as an input to a skill in the pipeline, then it is expected that this skill input may be missing some of the time. 需要评估自己的数据和场景,以确定在出现此警告后是否需要采取任何措施。You will need to evaluate your data and scenario to determine whether or not any action is required as a result of this warning.

若要在缺少输入的情况下提供默认值,可以使用条件技能生成默认值,然后使用条件技能的输出作为技能输入。If you want to provide a default value in case of missing input, you can use the Conditional skill to generate a default value and then use the output of the Conditional skill as the skill input.

{
    "@odata.type": "#Microsoft.Skills.Util.ConditionalSkill",
    "context": "/document",
    "inputs": [
        { "name": "condition", "source": "= $(/document/language) == null" },
        { "name": "whenTrue", "source": "= 'en'" },
        { "name": "whenFalse", "source": "= $(/document/language)" }
    ],
    "outputs": [ { "name": "output", "targetName": "languageWithDefault" } ]
}
ReasonReason 详细信息/示例Details/Example 解决方法Resolution
技能输入的类型错误Skill input is the wrong type “必需的技能输入未采用预期类型 String"Required skill input was not of the expected type String. 名称: text,源: /document/merged_content。”Name: text, Source: /document/merged_content." “必需的技能输入未采用预期格式。"Required skill input was not of the expected format. 名称: text,源: /document/merged_content。”Name: text, Source: /document/merged_content." “无法迭代非数组 /document/normalized_images/0/imageCelebrities/0/detail/celebrities。”"Cannot iterate over non-array /document/normalized_images/0/imageCelebrities/0/detail/celebrities." “无法选择非数组 /document/normalized_images/0/imageCelebrities/0/detail/celebrities 中的 0"Unable to select 0 in non-array /document/normalized_images/0/imageCelebrities/0/detail/celebrities" 某些技能需要特定类型的输入,例如,情绪技能要求 text 为字符串。Certain skills expect inputs of particular types, for example Sentiment skill expects text to be a string. 如果输入指定了非字符串值,则技能不会执行,且不会生成输出。If the input specifies a non-string value, then the skill doesn't execute and generates no outputs. 请确保数据集中的输入值在类型上一致,或者,请使用自定义 Web API 技能来预处理输入。Ensure your data set has input values uniform in type, or use a Custom Web API skill to preprocess the input. 如果在技能中迭代某个数组,请检查技能上下文和输入是否在正确的位置包含 *If you're iterating the skill over an array, check the skill context and input have * in the correct positions. 通常,上下文和输入源应以 * 作为数组的结尾。Usually both the context and input source should end with * for arrays.
缺少技能输入Skill input is missing “缺少必需的技能输入。"Required skill input is missing. 名称: text,源: /document/merged_content”。“缺少值 /document/normalized_images/0/imageTags。”Name: text, Source: /document/merged_content" "Missing value /document/normalized_images/0/imageTags." “无法在长度为 0 的数组 /document/pages 中选择 0。”"Unable to select 0 in array /document/pages of length 0." 如果所有文档都出现此警告,则很有可能是输入路径中存在拼写错误。请反复检查属性名称大小写、路径中多余或缺少的 *,并确保数据源中的文档提供必需的输入。If all your documents get this warning, most likely there is a typo in the input paths and you should double check property name casing, extra or missing * in the path, and make sure that the documents from the data source provide the required inputs.
技能语言代码输入无效Skill language code input is invalid 技能输入 languageCode 具有以下语言代码 X,Y,Z,其中至少有一个语言代码无效。Skill input languageCode has the following language codes X,Y,Z, at least one of which is invalid. 参阅下面的更多详细信息See more details below

警告:技能输入“languageCode”具有以下语言代码“X,Y,Z”,其中至少有一个语言代码无效。Warning: Skill input 'languageCode' has the following language codes 'X,Y,Z', at least one of which is invalid.

传入到下游技能的可选 languageCode 输入的一个或多个值不受支持。One or more of the values passed into the optional languageCode input of a downstream skill is not supported. 如果将 LanguageDetectionSkill 的输出传递给后续技能,而该输出包含的语言数目超过了这些下游技能所能支持的数目,则会出现此警告。This can occur if you are passing the output of the LanguageDetectionSkill to subsequent skills, and the output consists of more languages than are supported in those downstream skills.

如果你知道数据集全都采用一种语言,则应删除 LanguageDetectionSkilllanguageCode 技能输入,并对该技能改用 defaultLanguageCode 技能参数,前提是该技能支持该语言。If you know that your data set is all in one language, you should remove the LanguageDetectionSkill and the languageCode skill input and use the defaultLanguageCode skill parameter for that skill instead, assuming the language is supported for that skill.

如果你知道数据集包含多个语言,因而需要 LanguageDetectionSkilllanguageCode 输入,请考虑添加一个 ConditionalSkill,以筛选出采用不受支持语言的文本,然后将文本传入到下游技能。If you know that your data set contains multiple languages and thus you need the LanguageDetectionSkill and languageCode input, consider adding a ConditionalSkill to filter out the text with languages that are not supported before passing in the text to the downstream skill. 下面是 EntityRecognitionSkill 的用法示例:Here is an example of what this might look like for the EntityRecognitionSkill:

{
    "@odata.type": "#Microsoft.Skills.Util.ConditionalSkill",
    "context": "/document",
    "inputs": [
        { "name": "condition", "source": "= $(/document/language) == 'de' || $(/document/language) == 'en' || $(/document/language) == 'es' || $(/document/language) == 'fr' || $(/document/language) == 'it'" },
        { "name": "whenTrue", "source": "/document/content" },
        { "name": "whenFalse", "source": "= null" }
    ],
    "outputs": [ { "name": "output", "targetName": "supportedByEntityRecognitionSkill" } ]
}

下面是可能生成此错误消息的每项技能当前支持的语言的一些参考资源:Here are some references for the currently supported languages for each of the skills that may produce this error message:

警告:技能输入已截断Warning: Skill input was truncated

认知技能对每次可以分析的文本长度施加了限制。Cognitive skills have limits to the length of text that can be analyzed at once. 如果这些技能的文本输入超过该限制,我们会根据限制截断文本,然后对截断后的文本执行扩充。If the text input of these skills are over that limit, we will truncate the text to meet the limit, and then perform the enrichment on that truncated text. 这意味着,技能将会执行,但不会针对所有数据执行。This means that the skill is executed, but not over all of your data.

在下面的示例 LanguageDetectionSkill 中,如果 'text' 输入字段超过字符数限制,则可能会触发此警告。In the example LanguageDetectionSkill below, the 'text' input field may trigger this warning if it is over the character limit. 可以在技能文档中找到技能输入限制。You can find the skill input limits in the skills documentation.

 {
    "@odata.type": "#Microsoft.Skills.Text.LanguageDetectionSkill",
    "inputs": [
      {
        "name": "text",
        "source": "/document/text"
      }
    ],
    "outputs": [...]
  }

若要确保分析所有文本,请考虑使用拆分技能If you want to ensure that all text is analyzed, consider using the Split skill.

警告:Web API 技能响应包含警告Warning: Web API skill response contains warnings

索引器能够运行技能集中的技能,但 Web API 请求的响应指出执行期间出现了警告。Indexer was able to run a skill in the skillset, but the response from the Web API request indicated there were warnings during execution. 查看警告,了解数据受到了何种影响,以及是否需要采取措施。Review the warnings to understand how your data is impacted and whether or not action is required.

警告:当前索引器配置不支持增量进度Warning: The current indexer configuration does not support incremental progress

只有 Cosmos DB 数据源才会出现此警告。This warning only occurs for Cosmos DB data sources.

索引编制过程中的增量操作可确保由于暂时性故障或执行时间限制而中断索引器执行时,索引器能够在下次运行时从中断位置运行,而不是从头开始重新为整个集合编制索引。Incremental progress during indexing ensures that if indexer execution is interrupted by transient failures or execution time limit, the indexer can pick up where it left off next time it runs, instead of having to re-index the entire collection from scratch. 在为大型集合编制索引时,这一点尤其重要。This is especially important when indexing large collections.

可以使用按 _ts 列排序的文档,来预测未完成的索引编制作业的恢复能力。The ability to resume an unfinished indexing job is predicated on having documents ordered by the _ts column. 索引器使用时间戳来确定下一次要选取哪个文档。The indexer uses the timestamp to determine which document to pick up next. 如果缺少 _ts 列,或者索引器无法确定某个自定义查询是否按该列排序,则索引器将从开头开始,同时会出现此警告。If the _ts column is missing or if the indexer can't determine if a custom query is ordered by it, the indexer starts at beginning and you'll see this warning.

可以使用 assumeOrderByHighWatermarkColumn 配置属性重写此行为,以启用增量进度并消除此警告。It is possible to override this behavior, enabling incremental progress and suppressing this warning by using the assumeOrderByHighWatermarkColumn configuration property.

有关详细信息,请参阅增量进度和自定义查询For more information, see Incremental progress and custom queries.

警告:某些数据在投影期间已丢失。Warning: Some data was lost during projection. 表“Y”中行“X”的字符串属性“Z”太长。Row 'X' in table 'Y' has string property 'Z' which was too long.

表存储服务实体属性的大小施加了限制。The Table Storage service has limits on how large entity properties can be. 字符串最多只能包含 32,000 个字符。Strings can have 32,000 characters or less. 如果所要投影的行中的字符串属性超过 32,000 个字符,只会保留前 32,000 个字符。If a row with a string property longer than 32,000 characters is being projected, only the first 32,000 characters are preserved. 若要解决此问题,请避免投影其字符串属性超过 32,000 个字符的行。To work around this issue, avoid projecting rows with string properties longer than 32,000 characters.

警告:提取的文本已截断为 X 个字符Warning: Truncated extracted text to X characters

索引器会限制可从任一文档中提取的文本量。Indexers limit how much text can be extracted from any one document. 此限制取决于定价层:免费层为 32,000 个字符,基本层为 64,000 个字符,标准层为 400 万个字符、标准 S2 层为 800 万个字符,标准 S3 层为 1600 万个字符。This limit depends on the pricing tier: 32,000 characters for Free tier, 64,000 for Basic, 4 million for Standard, 8 million for Standard S2, and 16 million for Standard S3. 不会为已截断的文本编制索引。Text that was truncated will not be indexed. 若要避免此警告,请尝试将包含大量文本的文档分解为多个较小的文档。To avoid this warning, try breaking apart documents with large amounts of text into multiple, smaller documents.

有关详细信息,请参阅索引器限制For more information, see Indexer limits.

警告:无法将输出字段“X”映射到搜索索引Warning: Could not map output field 'X' to search index

引用不存在的数据/null 数据的输出字段映射会针对每个文档生成警告,并生成空索引字段。Output field mappings that reference non-existent/null data will produce warnings for each document and result in an empty index field. 若要解决此问题,请反复检查输出字段映射源路径是否存在拼写错误,或使用条件技能设置默认值。To workaround this issue, double-check your output field-mapping source paths for possible typos, or set a default value using the Conditional skill. 有关详细信息,请参阅输出字段映射See Output field mapping for details.

ReasonReason 详细信息/示例Details/Example 解决方法Resolution
无法循环访问非数组Cannot iterate over non-array “无法迭代非数组 /document/normalized_images/0/imageCelebrities/0/detail/celebrities。”"Cannot iterate over non-array /document/normalized_images/0/imageCelebrities/0/detail/celebrities." 当输出不是数组时,将发生此错误。This error occurs when the output is not an array. 如果你认为输出应该是数组,请检查指示的输出源字段路径是否有误。If you think the output should be an array, check the indicated output source field path for errors. 例如,源字段名称中可能缺少或有多余的 *For example, you might have a missing or extra * in the source field name. 也有可能是因为此技能的输入为 null,从而导致数组为空。It's also possible that the input to this skill is null, resulting in an empty array. 请在技能输入无效部分中查找类似的详细信息。Find similar details in Skill Input was Invalid section.
无法选择非数组中的 0Unable to select 0 in non-array “无法选择非数组 /document/pages 中的 0。”"Unable to select 0 in non-array /document/pages." 如果技能输出未生成数组,但输出源字段名称的路径中有数组索引或 *,则可能会发生这种情况。This could happen if the skills output does not produce an array and the output source field name has array index or * in its path. 请仔细检查输出源字段名称中提供的路径以及指示的字段名称的字段值。Please double check the paths provided in the output source field names and the field value for the indicated field name. 请在技能输入无效部分中查找类似的详细信息。Find similar details in Skill Input was Invalid section.

警告:数据更改检测策略配置为使用键列“X”Warning: The data change detection policy is configured to use key column 'X'

数据更改检测策略对它们用于检测更改的列提出了特定的要求。Data change detection policies have specific requirements for the columns they use to detect change. 其中的一项要求是,每当源项发生更改时,都要更新此列。One of these requirements is that this column is updated every time the source item is changed. 另一要求是,此列的新值大于以前的值。Another requirement is that the new value for this column is greater than the previous value. 键列不满足此要求,因为每次更新时它们不会更改。Key columns don't fulfill this requirement because they don't change on every update. 若要解决此问题,请为更改检测策略选择另一个列。To work around this issue, select a different column for the change detection policy.

警告:文档文本看似经过 UTF-16 编码,但缺少字节顺序标记Warning: Document text appears to be UTF-16 encoded, but is missing a byte order mark

索引器分析模式在分析文本之前需要知道文本的编码方式。The indexer parsing modes need to know how text is encoded before parsing it. 两种最常见的文本编码方式是 UTF-16 和 UTF-8。The two most common ways of encoding text are UTF-16 and UTF-8. UTF-8 是可变长度的编码,其中每个字符的长度为 1 字节到 4 字节。UTF-8 is a variable-length encoding where each character is between 1 byte and 4 bytes long. UTF-16 是固定长度的编码,其中每个字符的长度为 2 字节。UTF-16 is a fixed-length encoding where each character is 2 bytes long. UTF-16 具有两个不同的变体:“big endian”和“little endian”。UTF-16 has two different variants, "big endian" and "little endian". 文本编码由“字节顺序标记”(文本前面的一系列字节)确定。Text encoding is determined by a "byte order mark", a series of bytes before the text.

编码Encoding 字节顺序标记Byte Order Mark
UTF-16 Big EndianUTF-16 Big Endian 0xFE 0xFF0xFE 0xFF
UTF-16 Little EndianUTF-16 Little Endian 0xFF 0xFE0xFF 0xFE
UTF-8UTF-8 0xEF 0xBB 0xBF0xEF 0xBB 0xBF

如果不存在字节顺序标记,则假设文本以 UTF-8 编码。If no byte order mark is present, the text is assumed to be encoded as UTF-8.

若要解决此警告,请确定此 Blob 的文本编码,并添加相应的字节顺序标记。To work around this warning, determine what the text encoding for this blob is and add the appropriate byte order mark.

警告:Cosmos DB 集合“X”采用延迟索引策略。Warning: Cosmos DB collection 'X' has a Lazy indexing policy. 某些数据可能已丢失Some data may be lost

无法以一致的方式查询采用延迟索引策略的集合,从而导致索引器缺少数据。Collections with Lazy indexing policies can't be queried consistently, resulting in your indexer missing data. 若要解决此警告,请将索引策略更改为“一致”。To work around this warning, change your indexing policy to Consistent.