使用结果集缓存优化性能Performance tuning with result set caching

启用结果集缓存后,Synapse SQL 会自动将查询结果缓存到用户数据库中,以供重复使用。When result set caching is enabled, Synapse SQL automatically caches query results in the user database for repetitive use. 这样,后续的查询执行就能直接从持久性缓存中获取结果,因此无需重新计算。This allows subsequent query executions to get results directly from the persisted cache so recomputation is not needed. 结果集缓存提高了查询性能,并减少了计算资源的用量。Result set caching improves query performance and reduces compute resource usage. 此外,使用缓存结果集的查询不会占用任何并发槽,因此不会计入现有的并发限制。In addition, queries using cached results set do not use any concurrency slots and thus do not count against existing concurrency limits. 出于安全考虑,如果访问方用户的数据访问权限与创建缓存结果的用户相同,则访问方用户只能访问缓存的结果。For security, users can only access the cached results if they have the same data access permissions as the users creating the cached results.

关键命令Key commands

对用户数据库启用/禁用结果集缓存Turn ON/OFF result set caching for a user database

对会话启用/禁用结果集缓存Turn ON/OFF result set caching for a session

检查缓存结果集的大小Check the size of cached result set

清理缓存Clean up the cache

不缓存哪些内容What's not cached

对数据库启用结果集缓存后,在缓存填满之前将缓存所有查询的结果,但以下查询除外:Once result set caching is turned ON for a database, results are cached for all queries until the cache is full, except for these queries:

  • 带有内置函数或运行时表达式的查询,这些表达式为非确定表达式,即使基表的数据或查询中没有更改也是如此。Queries with built-in functions or runtime expressions that are non-deterministic even when there�s no change in base tables� data or query. 例如,DateTime.Now()、GetDate()。For example, DateTime.Now(), GetDate().
  • 使用用户定义的函数的查询Queries using user defined functions
  • 使用启用了行级安全性或列级安全性的表的查询Queries using tables with row level security or column level security enabled
  • 其返回数据中的行大小超过 64KB 的查询Queries returning data with row size larger than 64KB
  • 返回大数据(大于 10 GB)的查询Queries returning large data in size (>10GB)

备注

  • 某些非确定性函数和运行时表达式对相同数据的重复查询来说可以是确定性。Some non-deterministic functions and runtime expressions can be deterministic to repetitive queries against the same data. 例如,ROW_NUMBER()。For example, ROW_NUMBER().
  • 如果查询结果集中的行顺序/序列对应用程序逻辑很重要,请在查询中使用 ORDER BY。Use ORDER BY in your query if the order/sequence of rows in the query result set is important to your application logic.
  • 如果 ORDER BY 列中的数据不是唯一的,无论是否启用或禁用结果集缓存,都不能保证 ORDER BY 列中值相同的行的行顺序。If data in the ORDER BY columns are not unique, there's no garanteed row order for rows with the same values in the ORDER BY columns, regardless if result set caching is enabled or disabled.

重要

创建结果集缓存以及从缓存中检索数据的操作在 Synapse SQL 池实例的控制节点上进行。The operations to create result set cache and retrieve data from the cache happen on the control node of a Synapse SQL pool instance. 当结果集缓存处于打开状态时,运行返回大型结果集(例如,超过 1 GB)的查询可能会导致控制节点上带宽限制较高,并降低实例上的整体查询响应速度。When result set caching is turned ON, running queries that return large result set (for example, >1GB) can cause high throttling on the control node and slow down the overall query response on the instance. 这些查询通常在数据浏览或 ETL 操作过程中使用。Those queries are commonly used during data exploration or ETL operations. 若要避免对控制节点造成压力并导致性能问题,用户应在运行此类查询之前关闭数据库的结果集缓存。To avoid stressing the control node and cause performance issue, users should turn OFF result set caching on the database before running those types of queries.

此查询的运行持续时间以针对某个查询执行结果集缓存操作所需的时间为宜:Run this query for the time taken by result set caching operations for a query:

SELECT step_index, operation_type, location_type, status, total_elapsed_time, command
FROM sys.dm_pdw_request_steps
WHERE request_id  = <'request_id'>;

下面是在禁用结果集缓存的情况上执行的某个查询的示例输出。Here is an example output for a query executed with result set caching disabled.

屏幕截图显示了查询结果,其中包括位置类型和命令。

下面是在启用结果集缓存的情况上执行的某个查询的示例输出。Here is an example output for a query executed with result set caching enabled.

屏幕截图显示了查询结果,其中标注了命令从 [D W ResultCache D b] dot d b o 选择的 *。

何时使用缓存结果When cached results are used

如果满足以下所有要求,则会对查询重复使用缓存结果集:Cached result set is reused for a query if all of the following requirements are all met:

  • 运行查询的用户有权访问该查询中引用的所有表。The user who's running the query has access to all the tables referenced in the query.
  • 新查询与生成结果集缓存的前一查询之间完全匹配。There is an exact match between the new query and the previous query that generated the result set cache.
  • 从中生成缓存结果集的表中未发生任何数据或架构更改。There is no data or schema changes in the tables where the cached result set was generated from.

运行以下命令可以检查某个查询是否已执行并出现结果缓存命中或未命中情况。Run this command to check if a query was executed with a result cache hit or miss. result_cache_hit 列对缓存命中返回 1,对缓存失误返回 0,而对未使用结果集缓存的原因,则返回负值。The result_cache_hit column returns 1 for cache hit, 0 for cache miss, and negative values for reasons why result set caching was not used. 有关详细信息,请检查 sys.dm_pdw_exec_requestsCheck sys.dm_pdw_exec_requests for details.

SELECT request_id, command, result_cache_hit FROM sys.dm_pdw_exec_requests
WHERE request_id = <'Your_Query_Request_ID'>

管理缓存结果Manage cached results

每个数据库的结果集缓存最大大小为 1 TB。The maximum size of result set cache is 1 TB per database. 当基础查询数据更改时,缓存的结果将自动失效。The cached results are automatically invalidated when the underlying query data change.

缓存逐出由 Synapse SQL 按照以下计划自动管理:The cache eviction is managed by Synapse SQL automatically following this schedule:

  • 尚未使用结果集或已失效(每 48 小时执行一次)。Every 48 hours if the result set hasn't been used or has been invalidated.
  • 当结果集缓存接近最大大小时。When the result set cache approaches the maximum size.

用户可以使用以下选项之一手动清空整个结果集缓存:Users can manually empty the entire result set cache by using one of these options:

  • 对数据库禁用结果集缓存功能Turn OFF the result set cache feature for the database
  • 连接到数据库后运行 DBCC DROPRESULTSETCACHERun DBCC DROPRESULTSETCACHE while connected to the database

暂停数据库不会清空缓存结果集。Pausing a database won't empty cached result set.

后续步骤Next steps

有关更多开发技巧,请参阅 开发概述For more development tips, see development overview.