诊断和排查使用适用于 Cosmos DB 的 Azure Functions 触发器时出现的问题Diagnose and troubleshoot issues when using Azure Functions trigger for Cosmos DB

本文介绍在使用适用于 Cosmos DB 的 Azure Functions 触发器时出现的常见问题及其解决方法和诊断步骤。This article covers common issues, workarounds, and diagnostic steps, when you use the Azure Functions trigger for Cosmos DB.

依赖项Dependencies

适用于 Cosmos DB 的 Azure Functions 触发器和绑定依赖于基于基础 Azure Functions 运行时的扩展包。The Azure Functions trigger and bindings for Cosmos DB depend on the extension packages over the base Azure Functions runtime. 请始终保持这些包的更新状态,因为它们可能包含用于解决你所遇到的任何潜在问题的修复程序和新功能:Always keep these packages updated, as they might include fixes and new features that might address any potential issues you may encounter:

除非明确指定,否则本文每当提到运行时,都始终会参考 Azure Functions V2。This article will always refer to Azure Functions V2 whenever the runtime is mentioned, unless explicitly specified.

单独使用 Azure Cosmos DB SDKConsume the Azure Cosmos DB SDK independently

该扩展包的关键功能是为适用于 Cosmos DB 的 Azure Functions 触发器和绑定提供支持。The key functionality of the extension package is to provide support for the Azure Functions trigger and bindings for Cosmos DB. 它还包括 Azure Cosmos DB.NET SDK,用于帮助你以编程方式来与 Azure Cosmos DB 交互,而无需使用触发器和绑定。It also includes the Azure Cosmos DB .NET SDK, which is helpful if you want to interact with Azure Cosmos DB programmatically without using the trigger and bindings.

若要使用 Azure Cosmos DB SDK,请务必不要将项目添加到另一个 NuGet 包引用。If want to use the Azure Cosmos DB SDK, make sure that you don't add to your project another NuGet package reference. 而是让 SDK 引用通过 Azure Functions 的扩展包进行解析Instead, let the SDK reference resolve through the Azure Functions' Extension package. 独立于触发器和绑定使用 Azure Cosmos DB SDKConsume the Azure Cosmos DB SDK separately from the trigger and bindings

此外,如果手动创建自己的 Azure Cosmos DB SDK 客户端实例,应遵循以下模式:只提供一个使用单一实例模式方法的客户端实例。Additionally, if you are manually creating your own instance of the Azure Cosmos DB SDK client, you should follow the pattern of having only one instance of the client using a Singleton pattern approach. 此过程可避免操作中出现潜在的套接字问题。This process will avoid the potential socket issues in your operations.

常见情景和解决方法Common scenarios and workarounds

Azure 函数失败,错误消息指出集合不存在Azure Function fails with error message collection doesn't exist

Azure 函数失败并出现错误消息“源集合 'collection-name' (在数据库 'database-name' 中)或租约集合 'collection2-name' (在数据库 'database2-name' 中)不存在。Azure Function fails with error message "Either the source collection 'collection-name' (in database 'database-name') or the lease collection 'collection2-name' (in database 'database2-name') does not exist. 在侦听器启动之前,这两个集合必须存在。Both collections must exist before the listener starts. 若要自动创建租约集合,请将 'CreateLeaseCollectionIfNotExists' 设置为 'true'”To automatically create the lease collection, set 'CreateLeaseCollectionIfNotExists' to 'true'"

这表示运行触发器所需的一个或两个 Azure Cosmos 容器不存在,或者无法由 Azure 函数访问。This means that either one or both of the Azure Cosmos containers required for the trigger to work do not exist or are not reachable to the Azure Function. 该错误本身告知了触发器正在根据配置查找的 Azure Cosmos 数据库和容器The error itself will tell you which Azure Cosmos database and container is the trigger looking for based on your configuration.

  1. 验证 ConnectionStringSetting 属性,以及它是否引用了 Azure 函数应用中存在的设置Verify the ConnectionStringSetting attribute and that it references a setting that exists in your Azure Function App. 此属性中的值不应是连接字符串本身,而是配置设置的名称。The value on this attribute shouldn't be the Connection String itself, but the name of the Configuration Setting.
  2. 验证 databaseNamecollectionName 是否在 Azure Cosmos 帐户中存在。Verify that the databaseName and collectionName exist in your Azure Cosmos account. 如果使用自动值替换(使用 %settingName% 模式),请确保该设置的名称在 Azure 函数应用中存在。If you are using automatic value replacement (using %settingName% patterns), make sure the name of the setting exists in your Azure Function App.
  3. 如果未指定 LeaseCollectionName/leaseCollectionName,则默认值为“leases”。If you don't specify a LeaseCollectionName/leaseCollectionName, the default is "leases". 验证此类容器是否存在。Verify that such container exists. (可选)可将触发器中的 CreateLeaseCollectionIfNotExists 属性设置为 true,以自动创建该容器。Optionally you can set the CreateLeaseCollectionIfNotExists attribute in your Trigger to true to automatically create it.
  4. 验证 Azure Cosmos 帐户的防火墙配置,以查看它是否未阻止 Azure 函数。Verify your Azure Cosmos account's Firewall configuration to see to see that it's not it's not blocking the Azure Function.

Azure 函数无法启动并出现错误“共享吞吐量集合应有分区键”Azure Function fails to start with "Shared throughput collection should have a partition key"

旧版 Azure Cosmos DB 扩展不支持使用在共享吞吐量数据库中创建的租约容器。The previous versions of the Azure Cosmos DB Extension did not support using a leases container that was created within a shared throughput database. 若要解决此问题,请更新 Microsoft.Azure.WebJobs.Extensions.CosmosDB 扩展以获取最新版本。To resolve this issue, update the Microsoft.Azure.WebJobs.Extensions.CosmosDB extension to get the latest version.

Azure 函数无法启动且出现“必须为此操作提供 PartitionKey”消息。Azure Function fails to start with "PartitionKey must be supplied for this operation."

此错误表明你当前正在使用具有旧的扩展依赖关系的分区租用集合。This error means that you are currently using a partitioned lease collection with an old extension dependency. 请升级到最新的可用版本。Upgrade to the latest available version. 如果当前正在 Azure Functions V1 上运行,则需要升级到 Azure Functions V2。If you are currently running on Azure Functions V1, you will need to upgrade to Azure Functions V2.

Azure 函数无法启动并出现错误“租约集合(如果已分区)必须有与 ID 相同的分区键”。Azure Function fails to start with "The lease collection, if partitioned, must have partition key equal to id."

此错误表示当前租约容器已分区,但分区键路径不是 /idThis error means that your current leases container is partitioned, but the partition key path is not /id. 若要解决此问题,需要使用 /id 作为分区键来重新创建租约容器。To resolve this issue, you need to recreate the leases container with /id as the partition key.

尝试运行触发器时,You see a "Value cannot be null. Azure Functions 日志中出现“值不能为 null。参数名称: o”Parameter name: o" in your Azure Functions logs when you try to Run the Trigger

如果使用 Azure 门户,并在检查使用触发器的 Azure 函数时选择屏幕上的“运行”按钮,则会出现此问题。 This issue appears if you are using the Azure portal and you try to select the Run button on the screen when inspecting an Azure Function that uses the trigger. 触发器不需要选择“运行”即可启动,部署 Azure 函数时它会自动启动。The trigger does not require for you to select Run to start, it will automatically start when the Azure Function is deployed. 若要在 Azure 门户上检查 Azure 函数的日志流,只需转到受监视的容器并插入一些新项,然后自然就会看到触发器正在执行。If you want to check the Azure Function's log stream on the Azure portal, just go to your monitored container and insert some new items, you will automatically see the Trigger executing.

接收更改花费了太长的时间My changes take too long to be received

这种情形可能是多种原因造成的,应检查所有这些原因:This scenario can have multiple causes and all of them should be checked:

  1. Azure 函数是否部署在 Azure Cosmos 帐户所在的同一区域?Is your Azure Function deployed in the same region as your Azure Cosmos account? 为了获得最佳的网络延迟,应将 Azure 函数和 Azure Cosmos 帐户并置在同一个 Azure 区域。For optimal network latency, both the Azure Function and your Azure Cosmos account should be colocated in the same Azure region.
  2. Azure Cosmos 容器中发生的更改是持续性的还是偶发性的?Are the changes happening in your Azure Cosmos container continuous or sporadic? 如果是后者,原因可能是存储更改与 Azure 函数拾取更改的时间有所延迟。If it's the latter, there could be some delay between the changes being stored and the Azure Function picking them up. 这是因为,在内部,当触发器检查 Azure Cosmos 容器中的更改但未找到任何等待读取的更改时,它将休眠一定的时间(可配置,默认为 5 秒),然后检查新的更改(以避免 RU 消耗量偏高)。This is because internally, when the trigger checks for changes in your Azure Cosmos container and finds none pending to be read, it will sleep for a configurable amount of time (5 seconds, by default) before checking for new changes (to avoid high RU consumption). 可以通过触发器的配置中的 FeedPollDelay/feedPollDelay 设置来配置此休眠时间(该值预期以毫秒为单位)。You can configure this sleep time through the FeedPollDelay/feedPollDelay setting in the configuration of your trigger (the value is expected to be in milliseconds).
  3. Azure Cosmos 容器可能受到速率限制Your Azure Cosmos container might be rate-limited.
  4. 可以使用触发器中的 PreferredLocations 属性来指定 Azure 区域的逗号分隔列表,以定义自定义的首选连接顺序。You can use the PreferredLocations attribute in your trigger to specify a comma-separated list of Azure regions to define a custom preferred connection order.

某些更改在我的触发器中重复Some changes are repeated in my Trigger

“更改”这一概念指的是对文档的操作。The concept of a "change" is an operation on a document. 收到同一文档的事件的最常见情况有:The most common scenarios where events for the same document is received are:

  • 帐户使用的是“最终”一致性。The account is using Eventual consistency. 在“最终”一致性级别使用更改源时,后续更改源读取操作之间可能存在重复事件(一个读取操作的最后一个事件显示为下一个操作的第一个事件)。While consuming the change feed in an Eventual consistency level, there could be duplicate events in-between subsequent change feed read operations (the last event of one read operation appears as the first of the next).
  • 文档正在更新。The document is being updated. 更改源可能包含对相同文档的多个操作。如果该文档正在接收更新,则它可能会收到多个事件(每个更新一个事件)。The Change Feed can contain multiple operations for the same documents, if that document is receiving updates, it can pick up multiple events (one for each update). 若要区分对同一文档的不同操作,一个简单方法是跟踪_lsn每个更改的 属性One easy way to distinguish among different operations for the same document is to track the _lsn property for each change. 如果它们不匹配,则这些更改是对同一文档的不同更改。If they don't match, these are different changes over the same document.
  • 如果只通过 id 来标识文档,请记住,文档的唯一标识符是 id 加上其分区键(可以有两个 id 相同但分区键不同的文档)。If you are identifying documents just by id, remember that the unique identifier for a document is the id and its partition key (there can be two documents with the same id but different partition key).

触发器中缺少某些更改Some changes are missing in my Trigger

如果你发现 Azure 函数未拾取 Azure Cosmos 容器中发生的某些更改,则需要执行一个初始调查步骤。If you find that some of the changes that happened in your Azure Cosmos container are not being picked up by the Azure Function, there is an initial investigation step that needs to take place.

当 Azure 函数收到更改时,它通常会处理这些更改,并可能会选择性地将结果发送到另一个目标。When your Azure Function receives the changes, it often processes them, and could optionally, send the result to another destination. 调查丢失更改的问题时,请确保度量在引入时间点(启动 Azure 函数时)收到的更改,而不要度量目标上的更改。 When you are investigating missing changes, make sure you measure which changes are being received at the ingestion point (when the Azure Function starts), not on the destination.

如果目标中缺少某些更改,可能意味着在收到更改后执行 Azure 函数期间发生了某种错误。If some changes are missing on the destination, this could mean that is some error happening during the Azure Function execution after the changes were received.

在这种情况下,最佳措施是在代码中以及在可能正在处理更改的循环中添加 try/catch 块,以检测特定的项子集中出现的任何失败,并相应地对其进行处理(将这些项发送到另一个存储以做进一步的分析或重试)。In this scenario, the best course of action is to add try/catch blocks in your code and inside the loops that might be processing the changes, to detect any failure for a particular subset of items and handle them accordingly (send them to another storage for further analysis or retry).

备注

默认情况下,如果在代码执行期间发生未经处理的异常,则适用于 Cosmos DB 的 Azure Functions 触发器不会重试一批更改。The Azure Functions trigger for Cosmos DB, by default, won't retry a batch of changes if there was an unhandled exception during your code execution. 这意味着,更改未抵达目标的原因是无法处理它们。This means that the reason that the changes did not arrive at the destination is because that you are failing to process them.

如果你发现触发器根本未收到某些更改,则最常见的情形是有另一个 Azure 函数正在运行 。If you find that some changes were not received at all by your trigger, the most common scenario is that there is another Azure Function running. 该函数可能是部署在 Azure 中的另一个 Azure 函数,或者是在开发人员计算机本地运行的、采用完全相同配置(相同的受监视容器和租约容器)的 Azure 函数,并且此 Azure 函数正在窃取你的 Azure 函数预期要处理的更改子集。It could be another Azure Function deployed in Azure or an Azure Function running locally on a developer's machine that has exactly the same configuration (same monitored and lease containers), and this Azure Function is stealing a subset of the changes you would expect your Azure Function to process.

此外,如果你知道正在运行多少个 Azure 函数应用实例,则也可以验证这种情况。Additionally, the scenario can be validated, if you know how many Azure Function App instances you have running. 如果检查租约容器并统计其中包含的租约项数,这些项中的非重复 Owner 属性值应等于函数应用的实例数。If you inspect your leases container and count the number of lease items within, the distinct values of the Owner property in them should be equal to the number of instances of your Function App. 如果所有者数目超过已知的 Azure 函数应用实例数,则表示这些多出的所有者正在“窃取”更改。If there are more owners than the known Azure Function App instances, it means that these extra owners are the ones "stealing" the changes.

若要解决此问题,一个简单的方法是将采用新值/不同值的 LeaseCollectionPrefix/leaseCollectionPrefix 应用到你的函数,或使用新的租约容器进行测试。One easy way to work around this situation, is to apply a LeaseCollectionPrefix/leaseCollectionPrefix to your Function with a new/different value or, alternatively, test with a new leases container.

需要重启并从头开始重新处理容器中的所有项Need to restart and reprocess all the items in my container from the beginning

若要从头开始重新处理容器中的所有项,请执行以下操作:To reprocess all the items in a container from the beginning:

  1. 如果 Azure 函数当前正在运行,请将其停止。Stop your Azure function if it is currently running.
  2. 删除租约集合中的文档(或删除租约集合并重新创建一个空集合)Delete the documents in the lease collection (or delete and re-create the lease collection so it is empty)
  3. 将函数中的 StartFromBeginning CosmosDBTrigger 属性设置为 true。Set the StartFromBeginning CosmosDBTrigger attribute in your function to true.
  4. 重启 Azure 函数。Restart the Azure function. 现在,它会从头开始读取并处理所有更改。It will now read and process all changes from the beginning.

如果将 StartFromBeginning 设置为 true,则会告知 Azure 函数要从头开始读取集合历史记录的更改,而不是从当前时间开始读取。Setting StartFromBeginning to true will tell the Azure function to start reading changes from the beginning of the history of the collection instead of the current time. 这仅适用于尚未创建租约(即租约集合中的文档)的情况。This only works when there are no already created leases (that is, documents in the leases collection). 如果已创建租约,将此属性设置为 true 将不起作用;在这种情况下,当某个函数停止并重启时,它将从租约集合中定义的最后一个检查点开始读取。Setting this property to true when there are leases already created has no effect; in this scenario, when a function is stopped and restarted, it will begin reading from the last checkpoint, as defined in the leases collection. 若要从头开始重新处理,请完成上面的步骤 1-4。To reprocess from the beginning, follow the above steps 1-4.

只能通过 IReadOnlyList<Document> 或 JArray 进行绑定Binding can only be done with IReadOnlyList<Document> or JArray

如果 Azure Functions 项目(或任何引用的项目)包含对 Azure Cosmos DB SDK 的手动 NuGet 引用,而该版本与 Azure Functions Cosmos DB 扩展提供的版本不同,则会发生此错误。This error happens if your Azure Functions project (or any referenced project) contains a manual NuGet reference to the Azure Cosmos DB SDK with a different version than the one provided by the Azure Functions Cosmos DB Extension.

若要解决此问题,请删除已添加的手动 NuGet 引用,并让 Azure Cosmos DB SDK 引用通过 Azure Functions Cosmos DB 扩展包进行解析。To work around this situation, remove the manual NuGet reference that was added and let the Azure Cosmos DB SDK reference resolve through the Azure Functions Cosmos DB Extension package.

更改 Azure 函数在检测更改时的轮询间隔Changing Azure Function's polling interval for the detecting changes

如此前针对接收更改花费了太长的时间解释的那样,Azure 函数会休眠一定的时间(可配置,默认为 5 秒),然后检查新的更改(以避免 RU 消耗量偏高)。As explained earlier for My changes take too long to be received, Azure function will sleep for a configurable amount of time (5 seconds, by default) before checking for new changes (to avoid high RU consumption). 可以通过触发器的配置中的 FeedPollDelay/feedPollDelay 设置来配置此休眠时间(该值预期以毫秒为单位)。You can configure this sleep time through the FeedPollDelay/feedPollDelay setting in the configuration of your trigger (the value is expected to be in milliseconds).

后续步骤Next steps