通过具体化视图进行性能优化Performance tuning with materialized views

Synapse SQL 池中的具体化视图为复杂的分析查询提供了一种低维护的方法,可以在不改变任何查询的情况下获得快速的性能。Materialized views in Synapse SQL pool provide a low maintenance method for complex analytical queries to get fast performance without any query change. 本文讨论了使用具体化视图的一般指南。This article discusses the general guidance on using materialized views.

具体化视图与标准视图Materialized views vs. standard views

SQL 池支持标准视图和具体化视图。SQL pool supports standard and materialized views. 两者都是用 SELECT 表达式创建并以逻辑表的形式呈现给查询的虚拟表。Both are virtual tables created with SELECT expressions and presented to queries as logical tables. 视图封装了常见数据计算的复杂性,并为计算更改添加了一个抽象层,因此无需重写查询。Views encapsulate the complexity of common data computation and add an abstraction layer to computation changes so there's no need to rewrite queries.

每次使用标准视图时,该视图都会计算其数据。A standard view computes its data each time when the view is used. 磁盘上不存储任何数据。There's no data stored on disk. 人们通常使用标准视图作为工具,帮助组织数据库中的逻辑对象和查询。People typically use standard views as a tool that helps organize the logical objects and queries in a database. 若要使用标准视图,查询需要直接引用它。To use a standard view, a query needs to make direct reference to it.

具体化视图像表一样在 SQL 池中预先计算、存储和维护其数据。A materialized view pre-computes, stores, and maintains its data in SQL pool just like a table. 每次使用具体化视图时都不需要重新计算。There's no recomputation needed each time when a materialized view is used. 这就是为什么使用具体化视图中的全部或部分数据的查询可以获得更快的性能。That's why queries that use all or subset of the data in materialized views can get faster performance. 更好的是,查询可以使用具体化视图,而无需直接引用它,因此无需更改应用程序代码。Even better, queries can use a materialized view without making direct reference to it, so there's no need to change application code.

标准视图上的大多数要求仍然适用于具体化视图。Most of the requirements on a standard view still apply to a materialized view. 有关具体化视图语法和其他要求的详细信息,请参阅 CREATE MATERIALIZED VIEW AS SELECTFor details on the materialized view syntax and other requirements, refer to CREATE MATERIALIZED VIEW AS SELECT

比较Comparison 查看View 具体化视图Materialized View
查看定义View definition 存储在 SQL 池中。Stored in SQL pool. 存储在 SQL 池中。Stored in SQL pool.
查看内容View content 在每次使用视图时生成。Generated each time when the view is used. 在视图创建期间,预处理并存储在 SQL 池中。Pre-processed and stored in SQL pool during view creation. 随着数据添加到基础表中而更新。Updated as data is added to the underlying tables.
数据刷新Data refresh 始终更新Always updated 始终更新Always updated
从复杂查询检索视图数据的速度Speed to retrieve view data from complex queries Slow FastFast
额外存储Extra storage No Yes
语法Syntax CREATE VIEWCREATE VIEW CREATE MATERIALIZED VIEW AS SELECTCREATE MATERIALIZED VIEW AS SELECT

使用具体化视图的优点Benefits of using materialized views

设计合理的具体化视图具有以下优势:A properly designed materialized view provides the following benefits:

  • 使用联接和聚合函数减少复杂查询的执行时间。Reduce the execution time for complex queries with JOINs and aggregate functions. 查询越复杂,就越有可能节省执行时间。The more complex the query, the higher the potential for execution-time saving. 当查询的计算成本很高而生成的数据集很小时,可以获得最大的优势。The most benefit is gained when a query's computation cost is high and the resulting data set is small.
  • SQL 池中的优化器可以自动使用已部署的具体化视图来改进查询执行计划。The optimizer in SQL pool can automatically use deployed materialized views to improve query execution plans. 此过程对提供更快查询性能的用户是透明的,并且不需要查询直接引用具体化视图。This process is transparent to users providing faster query performance and doesn't require queries to make direct reference to the materialized views.
  • 需要对视图进行低维护。Require low maintenance on the views. 所有来自基表的增量数据更改将以同步方式自动添加到具体化视图中。All incremental data changes from the base tables are automatically added to the materialized views in a synchronous manner. 这种设计可以使查询具体化视图后返回的数据与直接查询基表后返回的数据相同。This design allows querying materialized views to return the same data as directly querying the base tables.
  • 具体化视图中的数据可以采用与基表不同的方式分发。The data in a materialized view can be distributed differently from the base tables.
  • 具体化视图中的数据与常规表中的数据具有相同的高可用性和复原优势。Data in materialized views gets the same high availability and resiliency benefits as data in regular tables.

在 SQL 池中实现的具体化视图还具有以下额外优势:The materialized views implemented in SQL pool also provide the following additional benefits:

与其他数据仓库提供程序相比,在 Azure SQL 数据仓库中实现的具体化视图还具有以下附加优势:Comparing to other data warehouse providers, the materialized views implemented in Azure SQL Data Warehouse also provide the following additional benefits:

常见方案Common scenarios

具体化视图通常用于以下情况:Materialized views are typically used in following scenarios:

需要提高大规模数据的复杂分析查询的性能Need to improve the performance of complex analytical queries against large data in size

复杂分析查询通常使用更多的聚合函数和表联接,因此在查询执行过程中会产生更多的计算密集型操作,如随机排布和联接。Complex analytical queries typically use more aggregation functions and table joins, causing more compute-heavy operations such as shuffles and joins in query execution. 这就是为什么这些查询需要更长时间才能完成,特别是对于大型表。That's why those queries take longer to complete, specially on large tables.

用户可以为查询的公共计算返回的数据创建具体化视图,因此当查询需要此数据时无需重新计算,从而降低计算成本并加快查询响应速度。Users can create materialized views for the data returned from the common computations of queries, so there's no recomputation needed when this data is needed by queries, allowing lower compute cost and faster query response.

在没有或只有少量查询更改的情况下需要更快的性能Need faster performance with no or minimum query changes

为支持常规 ETL 操作和报告,SQL 池中的模式和查询更改通常保持在最低限度。Schema and query changes in SQL pools are typically kept to a minimum to support regular ETL operations and reporting. 如果查询性能的提高可以弥补视图所产生的成本,则可以使用具体化视图来对查询性能进行优化。People can use materialized views for query performance tuning, if the cost incurred by the views can be offset by the gain in query performance.

与缩放和统计信息管理等其他优化选项相比,创建和维护具体化视图对生产的影响要小得多,而且潜在的性能增益也更高。In comparison to other tuning options such as scaling and statistics management, it's a much less impactful production change to create and maintain a materialized view and its potential performance gain is also higher.

  • 创建或维护具体化视图不会影响对基表运行的查询。Creating or maintaining materialized views does not impact the queries running against the base tables.
  • 查询优化器可以自动使用已部署的具体化视图,而无需在查询中直接引用视图。The query optimizer can automatically use the deployed materialized views without direct view reference in a query. 此功能减少了性能优化中查询更改的需要。This capability reduces the need for query change in performance tuning.

需要不同的数据分布策略来提高查询性能Need different data distribution strategy for faster query performance

SQL 池是一个分布式大规模并行处理 (MPP) 系统。SQL pool is a distributed massively parallel processing (MPP) system. SQL 池表中的数据使用三种分布策略(hash、round_robin 或 replicated)中的一种在 60 个节点上分布。Data in a SQL pool table is distributed across 60 nodes using one of three distribution strategies (hash, round_robin, or replicated).

数据分布在表创建时进行指定,并且在删除表之前保持不变。The data distribution is specified at the table creation time and stays unchanged until the table is dropped. 具体化视图是磁盘上的虚拟表,支持 hash 和 round_robin 数据分布。Materialized view being a virtual table on disk supports hash and round_robin data distributions. 用户可以选择符合后列特征的数据分布:与基表不同但对于经常使用视图的查询而言是最优的。Users can choose a data distribution that is different from the base tables but optimal for the performance of queries that use the views most.

设计指南Design guidance

下面是关于使用具体化视图提高查询性能的一般指南:Here is the general guidance on using materialized views to improve query performance:

针对工作负载设计Design for your workload

在开始创建具体化视图之前,必须从查询模式、重要性、频率和生成的数据大小等方面深入了解工作负载。Before you begin to create materialized views, it's important to have a deep understanding of your workload in terms of query patterns, importance, frequency, and the size of resulting data.

用户可以对查询优化器建议的具体化视图运行 EXPLAIN WITH_RECOMMENDATIONS <SQL_statement>。Users can run EXPLAIN WITH_RECOMMENDATIONS <SQL_statement> for the materialized views recommended by the query optimizer. 由于这些建议特定于查询,因此对一个查询有利的具体化视图可能并不适合同一工作负载中的其他查询。Since these recommendations are query-specific, a materialized view that benefits a single query may not be optimal for other queries in the same workload.

评估这些建议时应考虑工作负载需要。Evaluate these recommendations with your workload needs in mind. 理想的具体化视图是那些有利于工作负载性能的视图。The ideal materialized views are those that benefit the workload's performance.

注意快速查询和成本之间的权衡Be aware of the tradeoff between faster queries and the cost

每个具体化视图都有相应的数据存储成本和视图维护成本。For each materialized view, there's a data storage cost and a cost for maintaining the view. 当基表中的数据更改时,具体化视图的大小会增加,其物理结构也会改变。As data changes in base tables, the size of the materialized view increases and its physical structure also changes. 为了避免查询性能下降,每个具体化视图都由 SQL 池引擎单独维护。To avoid query performance degradation, each materialized view is maintained separately by the SQL pool engine.

当具体化视图和基表更改的数量增加时,维护工作负载会增加。The maintenance workload gets higher when the number of materialized views and base table changes increase. 用户应该检查查询性能的提高是否可以弥补所有具体化视图所产生的成本。Users should check if the cost incurred from all materialized views can be offset by the query performance gain.

可以对数据库中的具体化视图列表运行此查询:You can run this query for the list of materialized view in a database:

SELECT V.name as materialized_view, V.object_id
FROM sys.views V
JOIN sys.indexes I ON V.object_id= I.object_id AND I.index_id < 2;

减少具体化视图数量的选项:Options to reduce the number of materialized views:

  • 确定工作负载中的复杂查询经常使用的常见数据集。Identify common data sets frequently used by the complex queries in your workload. 创建具体化视图来存储这些数据集,以便优化器可以在创建执行计划时将它们用作构建基块。Create materialized views to store those data sets so the optimizer can use them as building blocks when creating execution plans.

  • 删除使用率低或不再需要的具体化视图。Drop the materialized views that have low usage or are no longer needed. 禁用的具体化视图不会得到维护,但仍会产生存储成本。A disabled materialized view is not maintained but it still incurs storage cost.

  • 合并在相同或相似基表上创建的具体化视图,即使它们的数据没有重叠。Combine materialized views created on the same or similar base tables even if their data doesn't overlap. 合并具体化视图可能导致视图的大小大于单独视图的总和,但是视图维护成本应该会降低。Combining materialized views could result in a larger view in size than the sum of the separate views, however the view maintenance cost should reduce. 例如:For example:


-- Query 1 would benefit from having a materialized view created with this SELECT statement

SELECT A, SUM(B)
FROM T
GROUP BY A

-- Query 2 would benefit from having a materialized view created with this SELECT statement

SELECT C, SUM(D)
FROM T
GROUP BY C

-- You could create a single mateiralized view of this form

SELECT A, C, SUM(B), SUM(D)
FROM T
GROUP BY A, C

并非所有性能优化都需要更改查询Not all performance tuning requires query change

SQL 池优化器可以自动使用已部署的具体化视图来提高查询性能。The SQL pool optimizer can automatically use deployed materialized views to improve query performance. 此支持透明地应用于不引用视图的查询和使用具体化视图创建中不支持的聚合的查询。This support is applied transparently to queries that don't reference the views and queries that use aggregates unsupported in materialized views creation. 无需更改任何查询。No query change is needed. 可以检查查询的预估执行计划,确认是否使用了具体化视图。You can check a query's estimated execution plan to confirm if a materialized view is used.

监视具体化视图Monitor materialized views

具体化视图存储在 SQL 池中,就像具有聚集列存储索引 (CCI) 的表一样。A materialized view is stored in the SQL pool just like a table with clustered columnstore index (CCI). 从具体化视图读取数据包括扫描 CCI 索引段和应用基表中的任何增量更改。Reading data from a materialized view includes scanning the CCI index segments and applying any incremental changes from base tables. 当增量更改的数量太多时,从具体化视图解析查询可能比直接查询基表花费更长的时间。When the number of incremental changes is too high, resolving a query from a materialized view can take longer than directly querying the base tables.

若要避免查询性能下降,最好运行 DBCC PDW_SHOWMATERIALIZEDVIEWOVERHEAD 来监视视图的 overhead_ratio (total_rows / max(1, base_view_row))。To avoid query performance degradation, it's a good practice to run DBCC PDW_SHOWMATERIALIZEDVIEWOVERHEAD to monitor the view's overhead_ratio (total_rows / max(1, base_view_row)). 如果具体化视图的 overhead_ratio 过高,则用户应重新生成该视图。Users should REBUILD the materialized view if its overhead_ratio is too high.

具体化视图和结果集缓存Materialized view and result set caching

SQL 池中同时引入了这两项功能,用于优化查询性能。These two features are introduced in SQL pool around the same time for query performance tuning. 结果集缓存用于从对静态数据的重复查询中获得高并发性和快速响应。Result set caching is used for getting high concurrency and fast response from repetitive queries against static data.

为使用缓存结果,请求缓存的查询的形式必须与生成缓存的查询匹配。To use the cached result, the form of the cache requesting query must match with the query that produced the cache. 此外,缓存的结果必须应用于整个查询。In addition, the cached result must apply to the entire query.

具体化视图允许基表中的数据更改。Materialized views allow data changes in the base tables. 具体化视图中的数据可以应用于查询的一部分。Data in materialized views can be applied to a piece of a query. 借助这一支持,使用相同的某些计算的不同查询可使用相同的具体化视图,以获得更快的性能。This support allows the same materialized views to be used by different queries that share some computation for faster performance.

示例Example

此示例使用类似于 TPCDS 的查询,该查询查找通过产品目录购物花费比在商店中的花费更多的客户,并识别优选客户及其所在国家/地区。This example uses a TPCDS-like query that finds customers who spend more money via catalog than in stores, identify the preferred customers and their country of origin. 查询包括从涉及 SUM() 和 GROUP BY 的三个子 SELECT 语句的并集中选择前 100 条记录。The query involves selecting TOP 100 records from the UNION of three sub-SELECT statements involving SUM() and GROUP BY.

WITH year_total AS (
SELECT c_customer_id customer_id
       ,c_first_name customer_first_name
       ,c_last_name customer_last_name
       ,c_preferred_cust_flag customer_preferred_cust_flag
       ,c_birth_country customer_birth_country
       ,c_login customer_login
       ,c_email_address customer_email_address
       ,d_year dyear
       ,sum(isnull(ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt+ss_ext_sales_price, 0)/2) year_total
       ,'s' sale_type
FROM customer
     ,store_sales
     ,date_dim
WHERE c_customer_sk = ss_customer_sk
   AND ss_sold_date_sk = d_date_sk
GROUP BY c_customer_id
         ,c_first_name
         ,c_last_name
         ,c_preferred_cust_flag
         ,c_birth_country
         ,c_login
         ,c_email_address
         ,d_year
UNION ALL
SELECT c_customer_id customer_id
       ,c_first_name customer_first_name
       ,c_last_name customer_last_name
       ,c_preferred_cust_flag customer_preferred_cust_flag
       ,c_birth_country customer_birth_country
       ,c_login customer_login
       ,c_email_address customer_email_address
       ,d_year dyear
       ,sum(isnull(cs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt+cs_ext_sales_price, 0)/2) year_total
       ,'c' sale_type
FROM customer
     ,catalog_sales
     ,date_dim
WHERE c_customer_sk = cs_bill_customer_sk
   AND cs_sold_date_sk = d_date_sk
GROUP BY c_customer_id
         ,c_first_name
         ,c_last_name
         ,c_preferred_cust_flag
         ,c_birth_country
         ,c_login
         ,c_email_address
         ,d_year
UNION ALL
SELECT c_customer_id customer_id
       ,c_first_name customer_first_name
       ,c_last_name customer_last_name
       ,c_preferred_cust_flag customer_preferred_cust_flag
       ,c_birth_country customer_birth_country
       ,c_login customer_login
       ,c_email_address customer_email_address
       ,d_year dyear
       ,sum(isnull(ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt+ws_ext_sales_price, 0)/2) year_total
       ,'w' sale_type
FROM customer
     ,web_sales
     ,date_dim
WHERE c_customer_sk = ws_bill_customer_sk
   AND ws_sold_date_sk = d_date_sk
GROUP BY c_customer_id
         ,c_first_name
         ,c_last_name
         ,c_preferred_cust_flag
         ,c_birth_country
         ,c_login
         ,c_email_address
         ,d_year
         )
  SELECT TOP 100
                  t_s_secyear.customer_id
                 ,t_s_secyear.customer_first_name
                 ,t_s_secyear.customer_last_name
                 ,t_s_secyear.customer_birth_country
FROM year_total t_s_firstyear
     ,year_total t_s_secyear
     ,year_total t_c_firstyear
     ,year_total t_c_secyear
     ,year_total t_w_firstyear
     ,year_total t_w_secyear
WHERE t_s_secyear.customer_id = t_s_firstyear.customer_id
   AND t_s_firstyear.customer_id = t_c_secyear.customer_id
   AND t_s_firstyear.customer_id = t_c_firstyear.customer_id
   AND t_s_firstyear.customer_id = t_w_firstyear.customer_id
   AND t_s_firstyear.customer_id = t_w_secyear.customer_id
   AND t_s_firstyear.sale_type = 's'
   AND t_c_firstyear.sale_type = 'c'
   AND t_w_firstyear.sale_type = 'w'
   AND t_s_secyear.sale_type = 's'
   AND t_c_secyear.sale_type = 'c'
   AND t_w_secyear.sale_type = 'w'
   AND t_s_firstyear.dyear+0 =  1999
   AND t_s_secyear.dyear+0 = 1999+1
   AND t_c_firstyear.dyear+0 =  1999
   AND t_c_secyear.dyear+0 =  1999+1
   AND t_w_firstyear.dyear+0 = 1999
   AND t_w_secyear.dyear+0 = 1999+1
   AND t_s_firstyear.year_total > 0
   AND t_c_firstyear.year_total > 0
   AND t_w_firstyear.year_total > 0
   AND CASE WHEN t_c_firstyear.year_total > 0 THEN t_c_secyear.year_total / t_c_firstyear.year_total ELSE NULL END
           > CASE WHEN t_s_firstyear.year_total > 0 THEN t_s_secyear.year_total / t_s_firstyear.year_total ELSE NULL END
   AND CASE WHEN t_c_firstyear.year_total > 0 THEN t_c_secyear.year_total / t_c_firstyear.year_total ELSE NULL END
           > CASE WHEN t_w_firstyear.year_total > 0 THEN t_w_secyear.year_total / t_w_firstyear.year_total ELSE NULL END
ORDER BY t_s_secyear.customer_id
         ,t_s_secyear.customer_first_name
         ,t_s_secyear.customer_last_name
         ,t_s_secyear.customer_birth_country
OPTION ( LABEL = 'Query04-af359846-253-3');

检查查询的预估执行计划。Check the query's estimated execution plan. 共有 18 个随机排布和 17 个联接操作,执行这些操作需要更多的时间。There are 18 shuffles and 17 joins operations, which take more time to execute. 现在让我们为三个子 SELECT 语句分别创建一个具体化视图。Now let's create one materialized view for each of the three sub-SELECT statements.

CREATE materialized view nbViewSS WITH (DISTRIBUTION=HASH(customer_id)) AS
SELECT c_customer_id customer_id
       ,c_first_name customer_first_name
       ,c_last_name customer_last_name
       ,c_preferred_cust_flag customer_preferred_cust_flag
       ,c_birth_country customer_birth_country
       ,c_login customer_login
       ,c_email_address customer_email_address
       ,d_year dyear
       ,sum(isnull(ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt+ss_ext_sales_price, 0)/2) year_total
          , count_big(*) AS cb
FROM dbo.customer
     ,dbo.store_sales
     ,dbo.date_dim
WHERE c_customer_sk = ss_customer_sk
   AND ss_sold_date_sk = d_date_sk
GROUP BY c_customer_id
         ,c_first_name
         ,c_last_name
         ,c_preferred_cust_flag
         ,c_birth_country
         ,c_login
         ,c_email_address
         ,d_year
GO
CREATE materialized view nbViewCS WITH (DISTRIBUTION=HASH(customer_id)) AS
SELECT c_customer_id customer_id
       ,c_first_name customer_first_name
       ,c_last_name customer_last_name
       ,c_preferred_cust_flag customer_preferred_cust_flag
       ,c_birth_country customer_birth_country
       ,c_login customer_login
       ,c_email_address customer_email_address
       ,d_year dyear
       ,sum(isnull(cs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt+cs_ext_sales_price, 0)/2) year_total
          , count_big(*) as cb
FROM dbo.customer
     ,dbo.catalog_sales
     ,dbo.date_dim
WHERE c_customer_sk = cs_bill_customer_sk
   AND cs_sold_date_sk = d_date_sk
GROUP BY c_customer_id
         ,c_first_name
         ,c_last_name
         ,c_preferred_cust_flag
         ,c_birth_country
         ,c_login
         ,c_email_address
         ,d_year

GO
CREATE materialized view nbViewWS WITH (DISTRIBUTION=HASH(customer_id)) AS
SELECT c_customer_id customer_id
       ,c_first_name customer_first_name
       ,c_last_name customer_last_name
       ,c_preferred_cust_flag customer_preferred_cust_flag
       ,c_birth_country customer_birth_country
       ,c_login customer_login
       ,c_email_address customer_email_address
       ,d_year dyear
       ,sum(isnull(ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt+ws_ext_sales_price, 0)/2) year_total
          , count_big(*) AS cb
FROM dbo.customer
     ,dbo.web_sales
     ,dbo.date_dim
WHERE c_customer_sk = ws_bill_customer_sk
   AND ws_sold_date_sk = d_date_sk
GROUP BY c_customer_id
         ,c_first_name
         ,c_last_name
         ,c_preferred_cust_flag
         ,c_birth_country
         ,c_login
         ,c_email_address
         ,d_year

再次检查原始查询的执行计划。Check the execution plan of the original query again. 现在,联接数量从 17 个更改为 5 个,且不再有随机排布。Now the number of joins changes from 17 to 5 and there's no shuffle anymore. 单击计划中的“筛选操作”图标,其输出列表显示数据是从具体化视图读取的,而不是从基表读取的。Click the Filter operation icon in the plan, its Output List shows the data is read from the materialized views instead of base tables.

Plan_Output_List_with_Materialized_Views

使用具体化视图,相同的查询运行得更快,且无需更改任何代码。With materialized views, the same query runs much faster without any code change.

后续步骤Next steps

有关更多开发技巧,请参阅 Synapse SQL 池开发概述For more development tips, see Synapse SQL pool development overview.