优化 Azure Cosmos DB 中的查询成本Optimize query cost in Azure Cosmos DB

Azure Cosmos DB 提供了丰富的数据库操作,包括对容器中的项进行操作的关系和分层查询。Azure Cosmos DB offers a rich set of database operations including relational and hierarchical queries that operate on the items within a container. 与这些操作关联的成本取决于完成操作所需的 CPU、IO 和内存。The cost associated with each of these operations varies based on the CPU, IO, and memory required to complete the operation. 可以考虑将请求单位 (RU) 视为执行各种数据库操作以提供请求所需的资源的单一度量值,而无需虑和管理硬件资源。Instead of thinking about and managing hardware resources, you can think of a request unit (RU) as a single measure for the resources required to perform various database operations to serve a request. 本文介绍如何评估查询的请求单位费用,并在性能和成本方面优化查询。This article describes how to evaluate request unit charges for a query and optimize the query in terms of performance and cost.

Azure Cosmos DB 中的查询通常按吞吐量从最快/最高效到较慢/效率较低进行排序,如下所示:Queries in Azure Cosmos DB are typically ordered from fastest/most efficient to slower/less efficient in terms of throughput as follows:

  • 针对单个分区键和项键的 GET 操作。GET operation on a single partition key and item key.

  • 单个分区键内具有筛选器子句的查询。Query with a filter clause within a single partition key.

  • 针对任何属性都没有等于或范围筛选器子句的查询。Query without an equality or range filter clause on any property.

  • 不包含筛选器的查询。Query without filters.

从一个或多个分区读取数据的查询会导致较高的延迟并使用较多的请求单位数。Queries that read data from one or more partitions incur higher latency and consume higher number of request units. 因为每个分区都具有针对所有属性的自动索引编制,因此,可以基于索引高效地执行查询。Since each partition has automatic indexing for all properties, the query can be served efficiently from the index. 通过使用并行选项,可以更快地进行使用多个分区的查询。You can make queries that use multiple partitions faster by using the parallelism options. 若要了解有关分区和分区键的详细信息,请参阅在 Azure Cosmos DB 中进行分区To learn more about partitioning and partition keys, see Partitioning in Azure Cosmos DB.

评估查询的请求单位费用Evaluate request unit charge for a query

将一些数据存储在 Azure Cosmos 容器中后,可以使用 Azure 门户中的数据资源管理器来构建和运行查询。Once you have stored some data in your Azure Cosmos containers, you can use the Data Explorer in the Azure portal to construct and run your queries. 此外可以通过使用数据资源管理器获取查询的成本。You can also get the cost of the queries by using the data explorer. 此方法使你了解系统支持的典型查询和操作所涉及的实际费用。This method will give you a sense of the actual charges involved with typical queries and operations that your system supports.

此外可以使用 SDK 以编程方式获取查询的成本。You can also get the cost of queries programmatically by using the SDKs. 要测量任何操作(如创建、更新或删除)的开销,请在使用 REST API 时检查 x-ms-request-charge 标头。To measure the overhead of any operation such as create, update, or delete inspect the x-ms-request-charge header when using REST API. 如果使用的是 .NET 或 Java SDK,则 RequestCharge 属性是获取请求费用的等效属性,并且此属性存在于 ResourceResponse 或 FeedResponse 中。If you are using the .NET or the Java SDK, the RequestCharge property is the equivalent property to get the request charge and this property is present within the ResourceResponse or FeedResponse.

// Measure the performance (request units) of writes 
ResourceResponse<Document> response = await client.CreateDocumentAsync(collectionSelfLink, myDocument); 

Console.WriteLine("Insert of an item consumed {0} request units", response.RequestCharge); 

// Measure the performance (request units) of queries 
IDocumentQuery<dynamic> queryable = client.CreateDocumentQuery(collectionSelfLink, queryString).AsDocumentQuery(); 

while (queryable.HasMoreResults) 
     { 
          FeedResponse<dynamic> queryResponse = await queryable.ExecuteNextAsync<dynamic>(); 
          Console.WriteLine("Query batch consumed {0} request units", queryResponse.RequestCharge); 
     }

影响查询请求单位费用的因素Factors influencing request unit charge for a query

查询的请求单位依赖于许多因素。Request units for queries are dependent on a number of factors. 例如,加载/返回的 Azure Cosmos 项的数量、对索引的查找次数、查询编译时间等详细信息。For example, the number of Azure Cosmos items loaded/returned, the number of lookups against the index, the query compilation time etc. details. Azure Cosmos DB 保证在相同数据上执行相同的查询时,即使重复执行,也始终使用相同数量的请求单位。Azure Cosmos DB guarantees that the same query when executed on the same data will always consume the same number of request units even with repeat executions. 使用查询执行指标的查询配置文件使你可以很好地了解请求单位的使用情况。The query profile using query execution metrics gives you a good idea of how the request units are spent.

在某些情况下,可能会在查询的分页执行中看到 200 个和 429 个响应序列以及变量请求单位,这是因为查询将根据可用的 RU 尽可能快地运行。In some cases you may see a sequence of 200 and 429 responses, and variable request units in a paged execution of queries, that is because queries will run as fast as possible based on the available RUs. 可能会看到查询执行在服务器和客户端之间分成多个页面/往返。You may see a query execution break into multiple pages/round trips between server and client. 例如,10,000 个项可以作为多个页面返回,每个页面根据对该页面执行的计算收费。For example, 10,000 items may be returned as multiple pages, each charged based on the computation performed for that page. 对这些页面求和时,应获得与整个查询相同的 RU 数。When you sum across these pages, you should get the same number of RUs as you would get for the entire query.

故障排除的指标Metrics for troubleshooting

查询、用户定义的函数 (UDF) 所使用的性能和吞吐量主要取决于函数本身。The performance and the throughput consumed by queries, user-defined functions (UDFs) mostly depends on the function body. 查找 UDF 中查询执行花费的时间和使用的 RU 数量的最简单方法是启用查询指标。The easiest way to find out how much time the query execution is spent in the UDF and the number of RUs consumed, is by enabling the Query Metrics. 如果使用的是 .NET SDK,则以下是 SDK 返回的示例查询指标:If you use the .NET SDK, here are sample query metrics returned by the SDK:

Retrieved Document Count                 :               1              
Retrieved Document Size                  :           9,963 bytes        
Output Document Count                    :               1              
Output Document Size                     :          10,012 bytes        
Index Utilization                        :          100.00 %            
Total Query Execution Time               :            0.48 milliseconds 
  Query Preparation Times 
    Query Compilation Time               :            0.07 milliseconds 
    Logical Plan Build Time              :            0.03 milliseconds 
    Physical Plan Build Time             :            0.05 milliseconds 
    Query Optimization Time              :            0.00 milliseconds 
  Index Lookup Time                      :            0.06 milliseconds 
  Document Load Time                     :            0.03 milliseconds 
  Runtime Execution Times 
    Query Engine Execution Time          :            0.03 milliseconds 
    System Function Execution Time       :            0.00 milliseconds 
    User-defined Function Execution Time :            0.00 milliseconds 
  Document Write Time                    :            0.00 milliseconds 
  Client Side Metrics 
    Retry Count                          :               1              
    Request Charge                       :            3.19 RUs  

成本优化查询的最佳做法Best practices to cost optimize queries

优化成本查询时,请考虑以下最佳做法:Consider the following best practices when optimizing queries for cost:

  • 共置多个实体类型Colocate multiple entity types

    尝试在单个或较少数量的容器中共置多个实体类型。Try to colocate multiple entity types within a single or smaller number of containers. 此方法不仅对定价有好处,而且对查询执行和事务也有好处。This method yields benefits not only from a pricing perspective, but also for query execution and transactions. 查询的作用域为单个容器;通过存储过程/触发器的多个记录的原子事务的作用域为单个容器内的分区键。Queries are scoped to a single container; and atomic transactions over multiple records via stored procedures/triggers are scoped to a partition key within a single container. 在同一容器内共置实体可以减少网络往返次数,以便解析记录之间的关系。Colocating entities within the same container can reduce the number of network round trips to resolve relationships across records. 因此,它可以提高端到端性能,为较大的数据集启用多个记录的原子事务,从而降低成本。So it increases the end-to-end performance, enables atomic transactions over multiple records for a larger dataset, and as a result lowers costs. 如果你的情景难以在单个或较少数量的容器中共置多个实体类型,通常是因为你正在迁移现有应用程序且不希望进行任何代码更改 - 你应该考虑在数据库级别预配吞吐量。If colocating multiple entity types within a single or smaller number of containers is difficult for your scenario, usually because you are migrating an existing application and you do not want to make any code changes - you should then consider provisioning throughput at the database level.

  • 测量和优化较低的每秒请求单位使用量Measure and tune for lower request units/second usage

    查询的复杂性会影响操作使用的请求单位 (RU)数量。The complexity of a query impacts how many request units (RUs) are consumed for an operation. 谓词数、谓词性质、UDF 数目以及源数据集的大小。The number of predicates, nature of the predicates, number of UDFs, and the size of the source data set. 所有这些因素都会影响查询操作的成本。All these factors influence the cost of query operations.

    请求标头中返回的请求费用表示给定查询的费用。Request charge returned in the request header indicates the cost of a given query. 例如,如果查询返回 1000 个 1 KB 项,则操作成本为 1000。For example, if a query returns 1000 1-KB items, the cost of the operation is 1000. 因此在一秒内,服务器在对后续请求进行速率限制之前,只接受两个此类请求。As such, within one second, the server honors only two such requests before rate limiting subsequent requests. 有关详细信息,请参阅请求单位一文和请求单位计算器。For more information, see request units article and the request unit calculator.

后续步骤Next steps

接下来,可通过以下文章详细了解 Azure Cosmos DB 中的成本优化:Next you can proceed to learn more about cost optimization in Azure Cosmos DB with the following articles: