排查使用 Azure Cosmos DB 时遇到的查询问题Troubleshoot query issues when using Azure Cosmos DB

适用于: SQL API

本文逐步说明排查 Azure Cosmos DB 中的查询问题的一般建议方法。This article walks through a general recommended approach for troubleshooting queries in Azure Cosmos DB. 虽然不应将本文中所述的步骤视为针对潜在查询问题的完全防御方法,但我们在其中包含了最常见的性能提示。Although you shouldn't consider the steps outlined in this article a complete defense against potential query issues, we've included the most common performance tips here. 应将本文用作起点,以排查 Azure Cosmos DB 核心 (SQL) API 中查询速度缓慢或费用较高的问题。You should use this article as a starting place for troubleshooting slow or expensive queries in the Azure Cosmos DB core (SQL) API. 还可以使用诊断日志来识别速度缓慢或消耗大量吞吐量的查询。You can also use diagnostics logs to identify queries that are slow or that consume significant amounts of throughput. 如果使用的是 Azure Cosmos DB API for MongoDB,则应使用 Azure Cosmos DB API for MongoDB 查询故障排除指南If you are using Azure Cosmos DB's API for MongoDB, you should use Azure Cosmos DB's API for MongoDB query troubleshooting guide

Azure Cosmos DB 中的查询优化大致分为以下类别:Query optimizations in Azure Cosmos DB are broadly categorized as follows:

  • 可降低查询请求单位 (RU) 费用的优化Optimizations that reduce the Request Unit (RU) charge of the query
  • 仅降低延迟的优化Optimizations that just reduce latency

如果降低查询的 RU 费用,通常还会降低延迟。If you reduce the RU charge of a query, you'll typically decrease latency as well.

本文提供可使用 nutrition 数据集重新创建的示例。This article provides examples that you can re-create by using the nutrition dataset.

常见 SDK 问题Common SDK issues

阅读本指南之前,考虑与查询引擎无关的常见 SDK 问题将很有帮助。Before reading this guide, it is helpful to consider common SDK issues that aren't related to the query engine.

  • 按照这些 SDK 性能提示进行操作。Follow these SDK Performance tips.
  • SDK 允许为查询设置 MaxItemCount,但不能指定最小项计数。The SDK allows setting a MaxItemCount for your queries but you can't specify a minimum item count.
    • 代码应处理从零到 MaxItemCount 的任意页大小。Code should handle any page size, from zero to the MaxItemCount.
  • 有时,即使未来页上包含结果,查询也可能包含空页,Sometimes queries may have empty pages even when there are results on a future page. 其原因可能包括:Reasons for this could be:
    • SDK 可能正在执行多个网络调用。The SDK could be doing multiple network calls.
    • 查询检索文档所花费的时间可能很长。The query might be taking a long time to retrieve the documents.
  • 所有查询都包含一个继续标记,该标记将允许查询继续进行。All queries have a continuation token that will allow the query to continue. 请确保完全耗尽查询。Be sure to drain the query completely. 详细了解如何处理多页结果Learn more about handling multiple pages of results

获取查询指标Get query metrics

在 Azure Cosmos DB 中优化查询时,第一步始终是获取查询的查询指标When you optimize a query in Azure Cosmos DB, the first step is always to get the query metrics for your query. 也可以通过 Azure 门户获取这些指标。These metrics are also available through the Azure portal. 在数据资源管理器中运行查询后,可在“结果”选项卡旁边看到查询指标:Once you run your query in the Data Explorer, the query metrics are visible next to the Results tab:

获取查询指标

获取查询指标后,将查询的“已检索文档计数”与“输出文档计数”进行比较 。After you get the query metrics, compare the Retrieved Document Count with the Output Document Count for your query. 使用这种比较可以确定要在本文中查看的相关部分。Use this comparison to identify the relevant sections to review in this article.

“已检索文档计数”是查询引擎需要加载的文档数。The Retrieved Document Count is the number of documents that the query engine needed to load. “输出文档计数”是查询结果所需的文档数。The Output Document Count is the number of documents that were needed for the results of the query. 如果“已检索文档计数”明显大于“输出文档计数”,则表明查询中至少有一个部分无法使用索引并需要执行扫描 。If the Retrieved Document Count is significantly higher than the Output Document Count, there was at least one part of your query that was unable to use an index and needed to do a scan.

请参阅以下部分,了解适用于你的方案的相关查询优化。Refer to the following sections to understand the relevant query optimizations for your scenario.

查询的 RU 费用过高Query's RU charge is too high

“已检索文档计数”明显大于“输出文档计数”Retrieved Document Count is significantly higher than Output Document Count


“已检索文档计数”约等于“输出文档计数”Retrieved Document Count is approximately equal to Output Document Count


查询的 RU 费用可接受,但延迟仍然过高Query's RU charge is acceptable but latency is still too high

“已检索文档计数”超过“输出文档计数”的查询Queries where Retrieved Document Count exceeds Output Document Count

“已检索文档计数”是查询引擎需要加载的文档数。The Retrieved Document Count is the number of documents that the query engine needed to load. “输出文档计数”是查询返回的文档数。The Output Document Count is the number of documents returned by the query. 如果“已检索文档计数”明显大于“输出文档计数”,则表明查询中至少有一个部分无法使用索引并需要执行扫描 。If the Retrieved Document Count is significantly higher than the Output Document Count, there was at least one part of your query that was unable to use an index and needed to do a scan.

下面是一个不完全由索引提供服务的扫描查询示例:Here's an example of scan query that wasn't entirely served by the index:

查询:Query:

SELECT VALUE c.description
FROM c
WHERE UPPER(c.description) = "BABYFOOD, DESSERT, FRUIT DESSERT, WITHOUT ASCORBIC ACID, JUNIOR"

查询指标:Query metrics:

Retrieved Document Count                 :          60,951
Retrieved Document Size                  :     399,998,938 bytes
Output Document Count                    :               7
Output Document Size                     :             510 bytes
Index Utilization                        :            0.00 %
Total Query Execution Time               :        4,500.34 milliseconds
  Query Preparation Times
    Query Compilation Time               :            0.09 milliseconds
    Logical Plan Build Time              :            0.05 milliseconds
    Physical Plan Build Time             :            0.04 milliseconds
    Query Optimization Time              :            0.01 milliseconds
  Index Lookup Time                      :            0.01 milliseconds
  Document Load Time                     :        4,177.66 milliseconds
  Runtime Execution Times
    Query Engine Times                   :          322.16 milliseconds
    System Function Execution Time       :           85.74 milliseconds
    User-defined Function Execution Time :            0.00 milliseconds
  Document Write Time                    :            0.01 milliseconds
Client Side Metrics
  Retry Count                            :               0
  Request Charge                         :        4,059.95 RUs

“已检索文档计数”(60,951) 明显大于“输出文档计数”(7),指示此查询导致了文档扫描 。The Retrieved Document Count (60,951) is significantly higher than the Output Document Count (7), implying that this query resulted in a document scan. 在本例中,系统函数 UPPER() 不使用索引。In this case, the system function UPPER() doesn't use an index.

在索引策略中包含所需的路径Include necessary paths in the indexing policy

索引策略应涵盖 WHERE 子句、ORDER BY 子句、JOIN 和大多数系统函数中包含的所有属性。Your indexing policy should cover any properties included in WHERE clauses, ORDER BY clauses, JOIN, and most system functions. 索引策略中指定的所需路径应与 JSON 文档中的属性相匹配。The desired paths specified in the index policy should match the properties in the JSON documents.

备注

Azure Cosmos DB 索引策略中的属性区分大小写Properties in Azure Cosmos DB indexing policy are case-sensitive

如果对 nutrition 数据集运行以下简单查询,你将注意到,在为 WHERE 子句中的属性编制索引时,RU 费用要低得多:If you run the following simple query on the nutrition dataset, you will observe a much lower RU charge when the property in the WHERE clause is indexed:

原始Original

查询:Query:

SELECT *
FROM c
WHERE c.description = "Malabar spinach, cooked"

索引策略:Indexing policy:

{
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
        {
            "path": "/*"
        }
    ],
    "excludedPaths": [
        {
            "path": "/description/*"
        }
    ]
}

RU 费用:409.51 RURU charge: 409.51 RUs

已优化Optimized

更新的索引策略:Updated indexing policy:

{
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
        {
            "path": "/*"
        }
    ],
    "excludedPaths": []
}

RU 费用:2.98 RURU charge: 2.98 RUs

可以随时将属性添加到索引策略,而不会影响写入或读取可用性。You can add properties to the indexing policy at any time, with no effect on write or read availability. 你可以跟踪索引转换进度You can track index transformation progress.

了解哪些系统函数使用索引Understand which system functions use the index

如果表达式可以被转换为一系列字符串值,则该表达式可以使用索引,If an expression can be translated into a range of string values, it can use the index. 否则不可使用索引。Otherwise, it can't.

下面是一些可使用索引的常用字符串函数的列表:Here's the list of some common string functions that can use the index:

  • STARTSWITH(str_expr1, str_expr2, bool_expr)STARTSWITH(str_expr1, str_expr2, bool_expr)
  • CONTAINS(str_expr, str_expr, bool_expr)CONTAINS(str_expr, str_expr, bool_expr)
  • LEFT(str_expr, num_expr) = str_exprLEFT(str_expr, num_expr) = str_expr
  • SUBSTRING(str_expr, num_expr, num_expr) = str_expr,但前提是第一个 num_expr 为 0SUBSTRING(str_expr, num_expr, num_expr) = str_expr, but only if the first num_expr is 0

下面是一些不使用索引且必须加载每个文档的常用系统函数:Following are some common system functions that don't use the index and must load each document:

系统函数System function 优化意见Ideas for optimization
UPPER/LOWERUPPER/LOWER 不要使用系统函数来规范化数据以进行比较,而是在插入时规范化大小写。Instead of using the system function to normalize data for comparisons, normalize the casing upon insertion. 诸如 SELECT * FROM c WHERE UPPER(c.name) = 'BOB' 的查询将变成 SELECT * FROM c WHERE c.name = 'BOB'A query like SELECT * FROM c WHERE UPPER(c.name) = 'BOB' becomes SELECT * FROM c WHERE c.name = 'BOB'.
数学函数(非聚合)Mathematical functions (non-aggregates) 如果需要频繁计算查询中的某个值,请考虑在 JSON 文档中将此值存储为属性。If you need to compute a value frequently in your query, consider storing the value as a property in your JSON document.

即使系统函数不使用索引,查询的其他部分也仍可以使用索引。Other parts of the query might still use the index even though the system functions don't.

了解哪些聚合查询使用索引Understand which aggregate queries use the index

在大多数情况下,Azure Cosmos DB 中的聚合系统函数将使用索引。In most cases, aggregate system functions in Azure Cosmos DB will use the index. 但是,根据聚合查询中的筛选器或其他子句,可能需要查询引擎来加载大量文档。However, depending on the filters or additional clauses in an aggregate query, the query engine may be required to load a high number of documents. 通常情况下,查询引擎将首先应用相等性和范围筛选器。Typically, the query engine will apply equality and range filters first. 应用这些筛选器后,查询引擎可以评估其他筛选器,并根据需要加载其余文档以计算聚合。After applying these filters, the query engine can evaluate additional filters and resort to loading remaining documents to compute the aggregate, if needed.

例如,给定以下两个示例查询,同时具有相等性和 CONTAINS 系统函数筛选器的查询总体上要比只具有 CONTAINS 系统函数筛选器的查询更加高效。For example, given these two sample queries, the query with both an equality and CONTAINS system function filter will generally be more efficient than a query with just a CONTAINS system function filter. 这是因为在需要为费用更高的 CONTAINS 筛选器加载文档之前,首先应用了相等性筛选器且相等性筛选器使用了索引。This is because the equality filter is applied first and uses the index before documents need to be loaded for the more expensive CONTAINS filter.

仅具有 CONTAINS 筛选器的查询 - 较高的 RU 费用:Query with only CONTAINS filter - higher RU charge:

SELECT COUNT(1)
FROM c
WHERE CONTAINS(c.description, "spinach")

同时具有相等性筛选器和 CONTAINS 筛选器的查询 - 较低的 RU 费用:Query with both equality filter and CONTAINS filter - lower RU charge:

SELECT AVG(c._ts)
FROM c
WHERE c.foodGroup = "Sausages and Luncheon Meats" AND CONTAINS(c.description, "spinach")

下面是不会完全使用索引的聚合查询的其他示例:Here are additional examples of aggregate queries that will not fully use the index:

具有系统函数且不使用索引的查询Queries with system functions that don't use the index

应参阅相关系统函数页以查看其是否使用索引。You should refer to the relevant system function's page to see if it uses the index.

SELECT MAX(c._ts)
FROM c
WHERE CONTAINS(c.description, "spinach")

具有用户定义函数 (UDF) 的聚合查询Aggregate queries with user-defined functions(UDF's)

SELECT AVG(c._ts)
FROM c
WHERE udf.MyUDF("Sausages and Luncheon Meats")

具有 GROUP BY 的查询Queries with GROUP BY

随着 GROUP BY 子句中属性基数的增加,具有 GROUP BY 的查询的 RU 费用也将增加。The RU charge of queries with GROUP BY will increase as the cardinality of the properties in the GROUP BY clause increases. 例如,在下面的查询中,查询的 RU 费用将随着数值唯一描述的增加而增加。In the below query, for example, the RU charge of the query will increase as the number unique descriptions increases.

具有 GROUP BY 子句的聚合函数的 RU 费用将高于单独的聚合函数的 RU 费用。The RU charge of an aggregate function with a GROUP BY clause will be higher than the RU charge of an aggregate function alone. 在此示例中,查询引擎必须加载与 c.foodGroup = "Sausages and Luncheon Meats" 筛选器匹配的每个文档,因此 RU 费用预计会很高。In this example, the query engine must load every document that matches the c.foodGroup = "Sausages and Luncheon Meats" filter so the RU charge is expected to be high.

SELECT COUNT(1)
FROM c
WHERE c.foodGroup = "Sausages and Luncheon Meats"
GROUP BY c.description

如果计划频繁运行相同的聚合查询,则与运行单个查询相比,使用 Azure Cosmos DB 更改源生成实时具体化视图可能更加高效。If you plan to frequently run the same aggregate queries, it may be more efficient to build a real-time materialized view with the Azure Cosmos DB change feed than running individual queries.

优化同时具有筛选器和 ORDER BY 子句的查询Optimize queries that have both a filter and an ORDER BY clause

虽然具有筛选器和 ORDER BY 子句的查询通常使用范围索引,但如果能够通过组合索引提供这些查询,这些查询将会更加高效。Although queries that have a filter and an ORDER BY clause will normally use a range index, they'll be more efficient if they can be served from a composite index. 除了修改索引策略以外,还应将组合索引中的所有属性添加到 ORDER BY 子句。In addition to modifying the indexing policy, you should add all properties in the composite index to the ORDER BY clause. 对查询进行的此更改可确保该查询使用组合索引。This change to the query will ensure that it uses the composite index. 可以通过对 nutrition 数据集运行查询来观察影响:You can observe the impact by running a query on the nutrition dataset:

原始Original

查询:Query:

SELECT *
FROM c
WHERE c.foodGroup = "Soups, Sauces, and Gravies"
ORDER BY c._ts ASC

索引策略:Indexing policy:

{

        "automatic":true,
        "indexingMode":"Consistent",
        "includedPaths":[  
            {  
                "path":"/*"
            }
        ],
        "excludedPaths":[]
}

RU 费用:44.28 RURU charge: 44.28 RUs

已优化Optimized

更新的查询(包含 ORDER BY 子句中的两个属性):Updated query (includes both properties in the ORDER BY clause):

SELECT *
FROM c
WHERE c.foodGroup = "Soups, Sauces, and Gravies"
ORDER BY c.foodGroup, c._ts ASC

更新的索引策略:Updated indexing policy:

{  
        "automatic":true,
        "indexingMode":"Consistent",
        "includedPaths":[  
            {  
                "path":"/*"
            }
        ],
        "excludedPaths":[],
        "compositeIndexes":[  
            [  
                {  
                    "path":"/foodGroup",
                    "order":"ascending"
        },
                {  
                    "path":"/_ts",
                    "order":"ascending"
                }
            ]
        ]
    }

RU 费用:8.86 RURU charge: 8.86 RUs

使用子查询优化 JOIN 表达式Optimize JOIN expressions by using a subquery

多值子查询可以通过在 WHERE 子句中的每个 select-many 表达式后面(而不是所有交叉联接后面)推送谓词,来优化 JOIN 表达式。Multi-value subqueries can optimize JOIN expressions by pushing predicates after each select-many expression rather than after all cross joins in the WHERE clause.

请看下面的查询:Consider this query:

SELECT Count(1) AS Count
FROM c
JOIN t IN c.tags
JOIN n IN c.nutrients
JOIN s IN c.servings
WHERE t.name = 'infant formula' AND (n.nutritionValue > 0
AND n.nutritionValue < 10) AND s.amount > 1

RU 费用:167.62 RURU charge: 167.62 RUs

对于此查询,索引将匹配包含名为 infant formulanutritionValue 大于 0 且 amount 大于 1 的标记的任何文档。For this query, the index will match any document that has a tag with the name infant formula, nutritionValue greater than 0, and amount greater than 1. 此处的 JOIN 表达式将在应用任何筛选器之前,对每个匹配文档执行 tags、nutrients 和 servings 数组的所有项的叉积计算。The JOIN expression here will perform the cross-product of all items of tags, nutrients, and servings arrays for each matching document before any filter is applied. 然后,WHERE 子句将对每个 <c, t, n, s> 元组应用筛选谓词。The WHERE clause will then apply the filter predicate on each <c, t, n, s> tuple.

例如,如果匹配文档在这三个数组中的每个数组中都具有 10 个项,则该文档将扩展为 1 x 10 x 10 x 10(即 1000)个元组。For example, if a matching document has 10 items in each of the three arrays, it will expand to 1 x 10 x 10 x 10 (that is, 1,000) tuples. 在此处使用子查询可帮助在与下一个表达式联接之前,筛选出联接的数组项。The use of subqueries here can help to filter out joined array items before joining with the next expression.

此查询等效于前面的查询,但使用了子查询:This query is equivalent to the preceding one but uses subqueries:

SELECT Count(1) AS Count
FROM c
JOIN (SELECT VALUE t FROM t IN c.tags WHERE t.name = 'infant formula')
JOIN (SELECT VALUE n FROM n IN c.nutrients WHERE n.nutritionValue > 0 AND n.nutritionValue < 10)
JOIN (SELECT VALUE s FROM s IN c.servings WHERE s.amount > 1)

RU 费用:22.17 RURU charge: 22.17 RUs

假设 tags 数组中只有一个项与筛选器相匹配,而 nutrients 和 servings 数组各有 5 个项。Assume that only one item in the tags array matches the filter and that there are five items for both the nutrients and servings arrays. 那么,JOIN 表达式将扩展为 1 x 1 x 5 x 5 = 25 个项,而不是第一个查询中的 1000 个项。The JOIN expressions will expand to 1 x 1 x 5 x 5 = 25 items, as opposed to 1,000 items in the first query.

“已检索文档计数”等于“输出文档计数”的查询Queries where Retrieved Document Count is equal to Output Document Count

如果“已检索文档计数”约等于“输出文档计数”,则查询引擎无需扫描许多不必要的文档 。If the Retrieved Document Count is approximately equal to the Output Document Count, the query engine didn't have to scan many unnecessary documents. 对于许多查询(例如,使用 TOP 关键字的查询)而言,“已检索文档计数”可能要比“输出文档计数”多 1 。For many queries, like those that use the TOP keyword, Retrieved Document Count might exceed Output Document Count by 1. 你无需为此担心。You don't need to be concerned about this.

最大限度地减少跨分区查询Minimize cross partition queries

当请求单位和数据存储需求增加时,Azure Cosmos DB 将使用分区来扩展单个容器。Azure Cosmos DB uses partitioning to scale individual containers as Request Unit and data storage needs increase. 每个物理分区具有不同的独立索引。Each physical partition has a separate and independent index. 如果查询具有匹配容器分区键的相等性筛选器,则你只需检查相关分区的索引。If your query has an equality filter that matches your container's partition key, you'll need to check only the relevant partition's index. 这种优化可以减少查询所需的 RU 总数。This optimization reduces the total number of RUs that the query requires.

如果有大量预配的 RU(超过 30,000)或存储了大量数据(约超过 100 GB),那么你可能需要有一个足够大的容器,以查看查询 RU 费用的显著降低。If you have a large number of provisioned RUs (more than 30,000) or a large amount of data stored (more than approximately 100 GB), you probably have a large enough container to see a significant reduction in query RU charges.

例如,如果使用分区键 foodGroup 创建容器,则以下查询只需检查单个物理分区:For example, if you create a container with the partition key foodGroup, the following queries will need to check only a single physical partition:

SELECT *
FROM c
WHERE c.foodGroup = "Soups, Sauces, and Gravies" and c.description = "Mushroom, oyster, raw"

具有带分区键的 IN 筛选器的查询将仅检查相关的物理分区,而不会“扇出”:Queries that have an IN filter with the partition key will only check the relevant physical partition(s) and will not "fan-out":

SELECT *
FROM c
WHERE c.foodGroup IN("Soups, Sauces, and Gravies", "Vegetables and Vegetable Products") and c.description = "Mushroom, oyster, raw"

对分区键使用范围筛选器或者对分区键不使用任何筛选器的查询将需要“扇出”,并检查每个物理分区的索引,以查看结果:Queries that have range filters on the partition key, or that don't have any filters on the partition key, will need to "fan-out" and check every physical partition's index for results:

SELECT *
FROM c
WHERE c.description = "Mushroom, oyster, raw"
SELECT *
FROM c
WHERE c.foodGroup > "Soups, Sauces, and Gravies" and c.description = "Mushroom, oyster, raw"

优化对多个属性具有筛选器的查询Optimize queries that have filters on multiple properties

虽然对多个属性具有筛选器的查询通常使用范围索引,但如果能够通过组合索引提供这些查询,这些查询将会更加高效。Although queries that have filters on multiple properties will normally use a range index, they'll be more efficient if they can be served from a composite index. 对于少量数据而言,这种优化不会产生明显影响。For small amounts of data, this optimization won't have a significant impact. 但对于大量数据,它可能非常有用。It could be useful, however, for large amounts of data. 对于每个组合索引,最多只能优化一个非相等性筛选器。You can only optimize, at most, one non-equality filter per composite index. 如果查询具有多个非相等性筛选器,请选取其中一个将使用组合索引的筛选器。If your query has multiple non-equality filters, pick one of them that will use the composite index. 其余筛选器将继续使用范围索引。The rest will continue to use range indexes. 非相等性筛选器必须在组合索引中最后定义。The non-equality filter must be defined last in the composite index. 详细了解组合索引Learn more about composite indexes.

下面是一些可以通过组合索引进行优化的查询示例:Here are some examples of queries that could be optimized with a composite index:

SELECT *
FROM c
WHERE c.foodGroup = "Vegetables and Vegetable Products" AND c._ts = 1575503264
SELECT *
FROM c
WHERE c.foodGroup = "Vegetables and Vegetable Products" AND c._ts > 1575503264

下面是相关的组合索引:Here's the relevant composite index:

{  
        "automatic":true,
        "indexingMode":"Consistent",
        "includedPaths":[  
            {  
                "path":"/*"
            }
        ],
        "excludedPaths":[],
        "compositeIndexes":[  
            [  
                {  
                    "path":"/foodGroup",
                    "order":"ascending"
                },
                {  
                    "path":"/_ts",
                    "order":"ascending"
                }
            ]
        ]
}

可降低查询延迟的优化Optimizations that reduce query latency

在许多情况下,当查询延迟仍然过高时,RU 费用是可接受的。In many cases, the RU charge might be acceptable when query latency is still too high. 以下部分概述了降低查询延迟的提示。The following sections give an overview of tips for reducing query latency. 如果对同一个数据集多次运行同一个查询,该查询通常每次都会产生相同的 RU 开销。If you run the same query multiple times on the same dataset, it will typically have the same RU charge each time. 但是,每次执行查询时,查询延迟可能各不相同。But query latency might vary between query executions.

提高邻近度Improve proximity

在非 Azure Cosmos DB 帐户的区域中运行的查询,比在同一区域中运行的查询的延迟更高。Queries that are run from a different region than the Azure Cosmos DB account will have higher latency than if they were run inside the same region. 例如,如果在台式机上运行代码,则延迟比从 Azure Cosmos DB 所在的同一 Azure 区域中的某个虚拟机上运行查询要高出几十或几百毫秒(或更高)。For example, if you're running code on your desktop computer, you should expect latency to be tens or hundreds of milliseconds higher (or more) than if the query came from a virtual machine within the same Azure region as Azure Cosmos DB. 可以轻松地在 Azure Cosmos DB 中以多区域方式分布数据,以确保将数据放在更靠近应用的位置。It's simple to multiple-regionally distribute data in Azure Cosmos DB to ensure you can bring your data closer to your app.

增大预配吞吐量Increase provisioned throughput

在 Azure Cosmos DB 中,预配的吞吐量以请求单位 (RU) 计量。In Azure Cosmos DB, your provisioned throughput is measured in Request Units (RUs). 假设某个查询消耗 5 RU 吞吐量。Imagine you have a query that consumes 5 RUs of throughput. 如果预配 1000 RU,则每秒可以运行该查询 200 次。For example, if you provision 1,000 RUs, you would be able to run that query 200 times per second. 如果在没有足够可用吞吐量的情况下尝试运行查询,Azure Cosmos DB 将返回 HTTP 429 错误。If you tried to run the query when there wasn't enough throughput available, Azure Cosmos DB would return an HTTP 429 error. 任何当前核心 (SQL) API SDK 在等待一小段时间后,都将自动重试该查询。Any of the current Core (SQL) API SDKs will automatically retry this query after waiting for a short time. 受限制的请求需要花费更长的时间,因此增加预配的吞吐量可以改进查询延迟。Throttled requests take longer, so increasing provisioned throughput can improve query latency. 可以在 Azure 门户的“指标”边栏选项卡上观察受限制的请求总数You can observe the total number of throttled requests on the Metrics blade of the Azure portal.

增加 MaxConcurrencyIncrease MaxConcurrency

并行查询的方式是并行查询多个分区。Parallel queries work by querying multiple partitions in parallel. 但就查询本身而言,会按顺序提取单个已分区集合中的数据。But data from an individual partitioned collection is fetched serially with respect to the query. 因此,如果将 MaxConcurrency 设置为分区数,则最有可能实现查询的最高性能,但前提是所有其他系统条件仍保持不变。So, if you set MaxConcurrency to the number of partitions, you have the best chance of achieving the most performant query, provided all other system conditions remain the same. 如果不知道分区数,可将 MaxConcurrency(或旧 SDK 版本中的 MaxDegreesOfParallelism)设置为较大的数字。If you don't know the number of partitions, you can set MaxConcurrency (or MaxDegreesOfParallelism in older SDK versions) to a high number. 系统会选择最小值(分区数、用户提供的输入)作为最大并行度。The system will choose the minimum (number of partitions, user provided input) as the maximum degree of parallelism.

增加 MaxBufferedItemCountIncrease MaxBufferedItemCount

查询专用于在客户端处理当前一批结果时预提取结果。Queries are designed to pre-fetch results while the current batch of results is being processed by the client. 预提取可帮助改进查询的总体延迟。Pre-fetching helps to improve the overall latency of a query. 设置 MaxBufferedItemCount 会限制预提取结果的数目。Setting MaxBufferedItemCount limits the number of pre-fetched results. 如果将此值设置为预期返回的结果数(或更大的数目),则查询从预提取中获益的程度将最大。If you set this value to the expected number of results returned (or a higher number), the query can get the most benefit from pre-fetching. 如果将此值设置为 -1,则系统将自动确定要缓冲的项数。If you set this value to -1, the system will automatically determine the number of items to buffer.

后续步骤Next steps

参阅以下文章,了解有关如何按查询度量 RU、获取执行统计信息以优化查询等信息:See the following articles for information on how to measure RUs per query, get execution statistics to tune your queries, and more: