Synapse SQL 池中的表统计信息Table statistics in Synapse SQL pool

本文提供的建议和示例适用于创建和更新 SQL 池中有关表的查询优化统计信息。In this article, you'll find recommendations and examples for creating and updating query-optimization statistics on tables in SQL pool.

为何使用统计信息Why use statistics

SQL 池对数据了解得越多,其针对数据执行查询的速度就越快。The more SQL pool knows about your data, the faster it can execute queries against it. 将数据载入 SQL 池之后,收集有关数据的统计信息是优化查询时可做的最重要事情之一。After loading data into SQL pool, collecting statistics on your data is one of the most important things you can do to optimize your queries.

SQL 池查询优化器是基于成本的优化器。The SQL pool query optimizer is a cost-based optimizer. 此优化器会对各种查询计划的成本进行比较,并选择成本最低的计划。It compares the cost of various query plans, and then chooses the plan with the lowest cost. 在大多数情况下,它会选择执行速度最快的计划。In most cases, it chooses the plan that will execute the fastest.

例如,如果优化器估计查询筛选的日期会返回一行数据,则它会选择一个计划。For example, if the optimizer estimates that the date your query is filtering on will return one row it will choose one plan. 如果优化器估计选定的日期会返回 1 百万行数据,则它会返回另一个计划。If it estimates that the selected date will return 1 million rows, it will return a different plan.

自动创建统计信息Automatic creation of statistic

启用数据库的 AUTO_CREATE_STATISTICS 选项时,SQL 池会分析传入的用户查询中是否缺少统计信息。When the database AUTO_CREATE_STATISTICS option is on, SQL pool analyzes incoming user queries for missing statistics.

如果缺少统计信息,查询优化器将在查询谓词或联接条件中各个列上创建统计信息,以改进查询计划的基数估计。If statistics are missing, the query optimizer creates statistics on individual columns in the query predicate or join condition to improve cardinality estimates for the query plan.

备注

默认情况下,自动创建统计信息目前处于开启状态。Automatic creation of statistics is currently turned on by default.

可运行以下命令来检查是否为 SQL 池配置了 AUTO_CREATE_STATISTICS:You can check if your SQL pool has AUTO_CREATE_STATISTICS configured by running the following command:

SELECT name, is_auto_create_stats_on
FROM sys.databases

如果 SQL 池未配置 AUTO_CREATE_STATISTICS,建议通过运行以下命令来启用此属性:If your SQL pool doesn't have AUTO_CREATE_STATISTICS configured, we recommend you enable this property by running the following command:

ALTER DATABASE <yourdatawarehousename>
SET AUTO_CREATE_STATISTICS ON

当检测到包含联接或存在某个谓词时,这些语句将触发统计信息的自动创建:These statements will trigger automatic creation of statistics:

  • SELECTSELECT
  • INSERT-SELECTINSERT-SELECT
  • CTASCTAS
  • UPDATEUPDATE
  • DELETEDELETE
  • EXPLAINEXPLAIN when containing a join or the presence of a predicate is detected

备注

不在临时或外部表上创建“自动创建统计信息”。Automatic creation of statistics are not created on temporary or external tables.

自动创建统计信息的过程是以同步方式完成的,因此,如果列中缺少统计信息,查询性能可能会轻微下降。Automatic creation of statistics is done synchronously so you may incur slightly degraded query performance if your columns are missing statistics. 为单个列创建统计信息所需的时间取决于表的大小。The time to create statistics for a single column depends on the size of the table.

为了避免性能大幅下降,应确保在分析系统之前先通过执行基准检验工作负载来创建统计信息。To avoid measurable performance degradation, you should ensure stats have been created first by executing the benchmark workload before profiling the system.

备注

统计信息的创建将记录在其他用户上下文中的 sys.dm_pdw_exec_requests 中。The creation of stats will be logged in sys.dm_pdw_exec_requests under a different user context.

在创建自动统计信息时,它们将采用以下格式:WA_Sys<以十六进制表示的 8 位列 ID><以十六进制表示的 8 位表 ID>。When automatic statistics are created, they will take the form: WA_Sys<8 digit column id in Hex><8 digit table id in Hex>. 可以通过运行 DBCC SHOW_STATISTICS 命令查看已创建的统计信息:You can view stats that have already been created by running the DBCC SHOW_STATISTICS command:

DBCC SHOW_STATISTICS (<table_name>, <target>)

table_name 是包含要显示的统计信息的表的名称。The table_name is the name of the table that contains the statistics to display. 此表不能为外部表。This table can't be an external table. 目标是要显示统计信息的目标索引、统计信息或列的名称。The target is the name of the target index, statistics, or column for which to display statistics information.

更新统计信息Update statistics

最佳实践之一是每天在添加新日期后,更新有关日期列的统计信息。One best practice is to update statistics on date columns each day as new dates are added. 每次有新行载入 SQL 池时,就会添加新的加载日期或事务日期。Each time new rows are loaded into the SQL pool, new load dates or transaction dates are added. 这些添加操作会更改数据分布情况并使统计信息过时。These additions change the data distribution and make the statistics out of date.

有关客户表中的国家/地区列的统计信息可能永远不需要更新,因为值的分布通常不会变化。Statistics on a country/region column in a customer table might never need to be updated since the distribution of values doesn't generally change. 假设客户间的分布固定不变,将新行添加到表变化并不会改变数据分布情况。Assuming the distribution is constant between customers, adding new rows to the table variation isn't going to change the data distribution.

但是,如果 SQL 池只包含一个国家/地区,并且引入了来自新国家/地区的数据,导致存储了多个国家/地区的数据,那么,就需要更新有关国家/地区列的统计信息。However, if your SQL pool only contains one country/region, and you bring in data from a new country/region, resulting in data from multiple countries/regions being stored, then you need to update statistics on the country/region column.

下面是关于更新统计信息的建议:The following are recommendations updating statistics:

统计信息更新频率Frequency of stats updates 保守:每日Conservative: Daily
加载或转换数据之后After loading or transforming your data
采样Sampling 如果行数少于 10 亿,则使用默认采样率 (20%)。Less than 1 billion rows, use default sampling (20 percent).
如果行数超过 10 亿,则使用 2% 的采样率。With more than 1 billion rows, use sampling of two percent.

在排查查询问题时,首先要询问的问题之一就是 “统计信息是最新的吗?”One of the first questions to ask when you're troubleshooting a query is, "Are the statistics up to date?"

此问题无法根据数据期限得到答案。This question isn't one that can be answered by the age of the data. 如果对基础数据未做重大更改,则最新的统计信息对象有可能非常陈旧。An up-to-date statistics object might be old if there's been no material change to the underlying data. 如果行数有明显变化或给定列的值分布有重大变化,则 需要更新统计信息。When the number of rows has changed substantially, or there is a material change in the distribution of values for a column, then it's time to update statistics.

没有任何动态管理视图可用于确定自上次更新统计信息以来表中的数据是否发生更改。There is no dynamic management view to determine if data within the table has changed since the last time statistics were updated. 以下两个查询可帮助确定统计信息是否过时。The following two queries can help you determine whether your statistics are stale.

查询 1: 找出统计信息中的行计数 (stats_row_count) 与实际行计数 (actual_row_count) 之间的差异。Query 1: Find out the difference between the row count from the statistics (stats_row_count) and the actual row count (actual_row_count).

select 
objIdsWithStats.[object_id], 
actualRowCounts.[schema], 
actualRowCounts.logical_table_name, 
statsRowCounts.stats_row_count, 
actualRowCounts.actual_row_count,
row_count_difference = CASE
    WHEN actualRowCounts.actual_row_count >= statsRowCounts.stats_row_count THEN actualRowCounts.actual_row_count - statsRowCounts.stats_row_count
    ELSE statsRowCounts.stats_row_count - actualRowCounts.actual_row_count
END,
percent_deviation_from_actual = CASE
    WHEN actualRowCounts.actual_row_count = 0 THEN statsRowCounts.stats_row_count
    WHEN statsRowCounts.stats_row_count = 0 THEN actualRowCounts.actual_row_count
    WHEN actualRowCounts.actual_row_count >= statsRowCounts.stats_row_count THEN CONVERT(NUMERIC(18, 0), CONVERT(NUMERIC(18, 2), (actualRowCounts.actual_row_count - statsRowCounts.stats_row_count)) / CONVERT(NUMERIC(18, 2), actualRowCounts.actual_row_count) * 100)
    ELSE CONVERT(NUMERIC(18, 0), CONVERT(NUMERIC(18, 2), (statsRowCounts.stats_row_count - actualRowCounts.actual_row_count)) / CONVERT(NUMERIC(18, 2), actualRowCounts.actual_row_count) * 100)
END
from
(
    select distinct object_id from sys.stats where stats_id > 1
) objIdsWithStats
left join
(
    select object_id, sum(rows) as stats_row_count from sys.partitions group by object_id
) statsRowCounts
on objIdsWithStats.object_id = statsRowCounts.object_id 
left join
(
    SELECT sm.name [schema] ,
    tb.name logical_table_name ,
    tb.object_id object_id ,
    SUM(rg.row_count) actual_row_count
    FROM sys.schemas sm
    INNER JOIN sys.tables tb ON sm.schema_id = tb.schema_id
    INNER JOIN sys.pdw_table_mappings mp ON tb.object_id = mp.object_id
    INNER JOIN sys.pdw_nodes_tables nt ON nt.name = mp.physical_name
    INNER JOIN sys.dm_pdw_nodes_db_partition_stats rg
    ON rg.object_id = nt.object_id
    AND rg.pdw_node_id = nt.pdw_node_id
    AND rg.distribution_id = nt.distribution_id
    WHERE 1 = 1
    GROUP BY sm.name, tb.name, tb.object_id
) actualRowCounts
on objIdsWithStats.object_id = actualRowCounts.object_id

查询 2: 通过检查每个表中上次更新统计信息的时间,找出统计信息的使用年限。Query 2: Find out the age of your statistics by checking the last time your statistics were updated on each table.

备注

如果给定列的值分布有重大变化,则应该更新统计信息,不管上次更新时间为何。If there is a material change in the distribution of values for a column, you should update statistics regardless of the last time they were updated.

SELECT
    sm.[name] AS [schema_name],
    tb.[name] AS [table_name],
    co.[name] AS [stats_column_name],
    st.[name] AS [stats_name],
    STATS_DATE(st.[object_id],st.[stats_id]) AS [stats_last_updated_date]
FROM
    sys.objects ob
    JOIN sys.stats st
        ON  ob.[object_id] = st.[object_id]
    JOIN sys.stats_columns sc
        ON  st.[stats_id] = sc.[stats_id]
        AND st.[object_id] = sc.[object_id]
    JOIN sys.columns co
        ON  sc.[column_id] = co.[column_id]
        AND sc.[object_id] = co.[object_id]
    JOIN sys.types  ty
        ON  co.[user_type_id] = ty.[user_type_id]
    JOIN sys.tables tb
        ON  co.[object_id] = tb.[object_id]
    JOIN sys.schemas sm
        ON  tb.[schema_id] = sm.[schema_id]
WHERE
    st.[user_created] = 1;

例如,SQL 池中的日期列往往需要经常更新统计信息 。Date columns in a SQL pool, for example, usually need frequent statistics updates. 每次有新行载入 SQL 池时,就会添加新的加载日期或事务日期。Each time new rows are loaded into the SQL pool, new load dates or transaction dates are added. 这些添加操作会更改数据分布情况并使统计信息过时。These additions change the data distribution and make the statistics out of date.

相反地,客户表上性别列的统计信息可能永远不需要更新。Conversely, statistics on a gender column in a customer table might never need to be updated. 假设客户间的分布固定不变,将新行添加到表变化并不会改变数据分布情况。Assuming the distribution is constant between customers, adding new rows to the table variation isn't going to change the data distribution.

如果 SQL 池只包含一种性别,而新的要求会导致出现多种性别,则需更新性别列的统计信息。If your SQL pool contains only one gender and a new requirement results in multiple genders, then you need to update statistics on the gender column.

有关详细信息,请参阅统计信息的通用指南。For more information, see general guidance for Statistics.

实施统计信息管理Implementing statistics management

通常可以扩展数据加载过程,确保在加载结束时更新统计信息,避免/最大程度地减少并发查询之间出现阻塞或资源争用的情况。It is often a good idea to extend your data-loading process to ensure that statistics are updated at the end of the load to avoid/minimize blocking or resource contention between concurrent queries.

当表更改其大小和/或其值分布时,数据加载最为频繁。The data load is when tables most frequently change their size and/or their distribution of values. 可以通过数据加载实施某些管理过程。Data-loading is a logical place to implement some management processes.

下面提供了有关更新统计信息的一些指导原则:The following guiding principles are provided for updating your statistics:

  • 确保加载的每个表至少包含一个更新的统计信息对象。Ensure that each loaded table has at least one statistics object updated. 这会在统计信息更新过程中更新表大小(行计数和页计数)信息。This updates the table size (row count and page count) information as part of the statistics update.
  • 将重点放在参与 JOIN、GROUP BY、ORDER BY 和 DISTINCT 子句的列上。Focus on columns participating in JOIN, GROUP BY, ORDER BY, and DISTINCT clauses.
  • 考虑更频繁地更新“递增键”列(例如事务日期),因为这些值不包含在统计信息直方图中。Consider updating "ascending key" columns such as transaction dates more frequently, because these values will not be included in the statistics histogram.
  • 考虑较不经常更新静态分布列。Consider updating static distribution columns less frequently.
  • 请记住,每个统计信息对象是按顺序更新的。Remember, each statistic object is updated in sequence. 仅实现 UPDATE STATISTICS <TABLE_NAME> 不一定总很理想 - 尤其是对包含许多统计信息对象的宽型表而言。Simply implementing UPDATE STATISTICS <TABLE_NAME> isn't always ideal, especially for wide tables with lots of statistics objects.

有关详细信息,请参阅基数估计For more information, see Cardinality Estimation.

示例:创建统计信息Examples: Create statistics

以下示例演示如何使用各种选项来创建统计信息。These examples show how to use various options for creating statistics. 用于每个列的选项取决于数据特征以及在查询中使用列的方式。The options that you use for each column depend on the characteristics of your data and how the column will be used in queries.

使用默认选项创建单列统计信息Create single-column statistics with default options

若要基于某个列创建统计信息,需要提供统计信息对象的名称和列的名称。To create statistics on a column, provide a name for the statistics object and the name of the column.

此语法使用所有默认选项。This syntax uses all of the default options. 默认情况下,SQL 池在创建统计信息时会提取 20% 的表数据作为样本。By default, SQL pool samples 20 percent of the table when it creates statistics.

CREATE STATISTICS [statistics_name] ON [schema_name].[table_name]([column_name]);

例如:For example:

CREATE STATISTICS col1_stats ON dbo.table1 (col1);

通过检查每个行创建单列统计信息Create single-column statistics by examining every row

20% 的默认采样率足以应付大多数情况。The default sampling rate of 20 percent is sufficient for most situations. 不过,可以调整采样率。However, you can adjust the sampling rate.

若要采样整个表,请使用此语法:To sample the full table, use this syntax:

CREATE STATISTICS [statistics_name] ON [schema_name].[table_name]([column_name]) WITH FULLSCAN;

例如:For example:

CREATE STATISTICS col1_stats ON dbo.table1 (col1) WITH FULLSCAN;

通过指定样本大小创建单列统计信息Create single-column statistics by specifying the sample size

或者,可以以百分比指定样本大小:Alternatively, you can specify the sample size as a percent:

CREATE STATISTICS col1_stats ON dbo.table1 (col1) WITH SAMPLE = 50 PERCENT;

只对某些行创建单列统计信息Create single-column statistics on only some of the rows

也可以对表中的部分行创建统计信息。You can also create statistics on a portion of the rows in your table. 这称为筛选的统计信息。This is called a filtered statistic.

例如,在计划查询大型分区表的特定分区时,可以使用筛选的统计信息。For example, you can use filtered statistics when you plan to query a specific partition of a large partitioned table. 只对分区值创建统计信息,统计信息的准确度会改善,并因而改善查询性能。By creating statistics on only the partition values, the accuracy of the statistics will improve, and therefore improve query performance.

此示例会基于一系列的值创建统计信息。This example creates statistics on a range of values. 可以轻松定义这些值以匹配分区中的值范围。The values can easily be defined to match the range of values in a partition.

CREATE STATISTICS stats_col1 ON table1(col1) WHERE col1 > '2000101' AND col1 < '20001231';

备注

若要让查询优化器在选择分布式查询计划时考虑使用筛选的统计信息,查询必须符合统计信息对象的定义。For the query optimizer to consider using filtered statistics when it chooses the distributed query plan, the query must fit inside the definition of the statistics object. 使用上述示例,查询的 WHERE 子句需要指定介于 2000101 和 20001231 之间的 col1 值。Using the previous example, the query's WHERE clause needs to specify col1 values between 2000101 and 20001231.

使用所有选项创建单列统计信息Create single-column statistics with all the options

也可以将选项组合在一起。You can also combine the options together. 以下示例使用自定义样本大小创建筛选的统计信息对象:The following example creates a filtered statistics object with a custom sample size:

CREATE STATISTICS stats_col1 ON table1 (col1) WHERE col1 > '2000101' AND col1 < '20001231' WITH SAMPLE = 50 PERCENT;

有关完整参考,请参阅 CREATE STATISTICSFor the full reference, see CREATE STATISTICS.

创建多列统计信息Create multi-column statistics

若要创建多列统计信息对象,请使用上述示例,但要指定更多的列。To create a multi-column statistics object, use the previous examples, but specify more columns.

备注

用于估计查询结果中行数的直方图只适用于统计信息对象定义中所列的第一个列。The histogram, which is used to estimate the number of rows in the query result, is only available for the first column listed in the statistics object definition.

在此示例中,直方图位于 product_category。In this example, the histogram is on product_category. 跨列统计信息是根据 product_categoryproduct_sub_category 计算的:Cross-column statistics are calculated on product_category and product_sub_category:

CREATE STATISTICS stats_2cols ON table1 (product_category, product_sub_category) WHERE product_category > '2000101' AND product_category < '20001231' WITH SAMPLE = 50 PERCENT;

由于 product_categoryproduct_sub_category 之间存在关联,因此在同时访问这些列时,多列统计信息对象相当有用。Because there is a correlation between product_category and product_sub_category, a multi-column statistics object can be useful if these columns are accessed at the same time.

基于表中的所有列创建统计信息Create statistics on all columns in a table

创建统计信息的方法之一是在创建表后发出 CREATE STATISTICS 命令:One way to create statistics is to issue CREATE STATISTICS commands after creating the table:

CREATE TABLE dbo.table1
(
   col1 int
,  col2 int
,  col3 int
)
WITH
  (
    CLUSTERED COLUMNSTORE INDEX
  )
;

CREATE STATISTICS stats_col1 on dbo.table1 (col1);
CREATE STATISTICS stats_col2 on dbo.table2 (col2);
CREATE STATISTICS stats_col3 on dbo.table3 (col3);

使用存储过程基于数据库中的所有列创建统计信息Use a stored procedure to create statistics on all columns in a database

SQL 池不提供相当于 SQL Server 中 sp_create_stats 的系统存储过程。SQL pool does not have a system stored procedure equivalent to sp_create_stats in SQL Server. 此存储过程基于数据库中尚不包含统计信息的每个列创建单列统计信息对象。This stored procedure creates a single column statistics object on every column of the database that doesn't already have statistics.

以下示例可以帮助你开始进行数据库设计。The following example will help you get started with your database design. 可以根据需要任意改写此存储过程。Feel free to adapt it to your needs.

CREATE PROCEDURE    [dbo].[prc_sqldw_create_stats]
(   @create_type    tinyint -- 1 default 2 Fullscan 3 Sample
,   @sample_pct     tinyint
)
AS

IF @create_type IS NULL
BEGIN
    SET @create_type = 1;
END;

IF @create_type NOT IN (1,2,3)
BEGIN
    THROW 151000,'Invalid value for @stats_type parameter. Valid range 1 (default), 2 (fullscan) or 3 (sample).',1;
END;

IF @sample_pct IS NULL
BEGIN;
    SET @sample_pct = 20;
END;

IF OBJECT_ID('tempdb..#stats_ddl') IS NOT NULL
BEGIN;
    DROP TABLE #stats_ddl;
END;

CREATE TABLE #stats_ddl
WITH    (   DISTRIBUTION    = HASH([seq_nmbr])
        ,   LOCATION        = USER_DB
        )
AS
WITH T
AS
(
SELECT      t.[name]                        AS [table_name]
,           s.[name]                        AS [table_schema_name]
,           c.[name]                        AS [column_name]
,           c.[column_id]                   AS [column_id]
,           t.[object_id]                   AS [object_id]
,           ROW_NUMBER()
            OVER(ORDER BY (SELECT NULL))    AS [seq_nmbr]
FROM        sys.[tables] t
JOIN        sys.[schemas] s         ON  t.[schema_id]       = s.[schema_id]
JOIN        sys.[columns] c         ON  t.[object_id]       = c.[object_id]
LEFT JOIN   sys.[stats_columns] l   ON  l.[object_id]       = c.[object_id]
                                    AND l.[column_id]       = c.[column_id]
                                    AND l.[stats_column_id] = 1
LEFT JOIN    sys.[external_tables] e    ON    e.[object_id]        = t.[object_id]
WHERE       l.[object_id] IS NULL
AND            e.[object_id] IS NULL -- not an external table
)
SELECT  [table_schema_name]
,       [table_name]
,       [column_name]
,       [column_id]
,       [object_id]
,       [seq_nmbr]
,       CASE @create_type
        WHEN 1
        THEN    CAST('CREATE STATISTICS '+QUOTENAME('stat_'+table_schema_name+ '_' + table_name + '_'+column_name)+' ON '+QUOTENAME(table_schema_name)+'.'+QUOTENAME(table_name)+'('+QUOTENAME(column_name)+')' AS VARCHAR(8000))
        WHEN 2
        THEN    CAST('CREATE STATISTICS '+QUOTENAME('stat_'+table_schema_name+ '_' + table_name + '_'+column_name)+' ON '+QUOTENAME(table_schema_name)+'.'+QUOTENAME(table_name)+'('+QUOTENAME(column_name)+') WITH FULLSCAN' AS VARCHAR(8000))
        WHEN 3
        THEN    CAST('CREATE STATISTICS '+QUOTENAME('stat_'+table_schema_name+ '_' + table_name + '_'+column_name)+' ON '+QUOTENAME(table_schema_name)+'.'+QUOTENAME(table_name)+'('+QUOTENAME(column_name)+') WITH SAMPLE '+CONVERT(varchar(4),@sample_pct)+' PERCENT' AS VARCHAR(8000))
        END AS create_stat_ddl
FROM T
;

DECLARE @i INT              = 1
,       @t INT              = (SELECT COUNT(*) FROM #stats_ddl)
,       @s NVARCHAR(4000)   = N''
;

WHILE @i <= @t
BEGIN
    SET @s=(SELECT create_stat_ddl FROM #stats_ddl WHERE seq_nmbr = @i);

    PRINT @s
    EXEC sp_executesql @s
    SET @i+=1;
END

DROP TABLE #stats_ddl;

若要使用默认设置基于表中的所有列创建统计信息,请执行存储过程。To create statistics on all columns in the table using the defaults, execute the stored procedure.

EXEC [dbo].[prc_sqldw_create_stats] 1, NULL;

若要通过全面扫描基于表中的所有列创建统计信息,请调用此过程。To create statistics on all columns in the table using a fullscan, call this procedure.

EXEC [dbo].[prc_sqldw_create_stats] 2, NULL;

要对表中的所有列创建示例统计信息,请输入 3 和示例百分比。To create sampled statistics on all columns in the table, enter 3, and the sample percent. 此过程使用 20% 的采样率。This procedure uses a 20 percent sample rate.

EXEC [dbo].[prc_sqldw_create_stats] 3, 20;

示例:更新统计信息Examples: Update statistics

要更新统计信息,可以:To update statistics, you can:

  • 更新一个统计信息对象。Update one statistics object. 指定要更新的统计信息对象名称。Specify the name of the statistics object you want to update.
  • 更新表中的所有统计信息对象。Update all statistics objects on a table. 指定表名称,而不是一个特定的统计信息对象。Specify the name of the table instead of one specific statistics object.

更新一个特定的统计信息对象Update one specific statistics object

使用以下语法来更新特定的统计信息对象:Use the following syntax to update a specific statistics object:

UPDATE STATISTICS [schema_name].[table_name]([stat_name]);

例如:For example:

UPDATE STATISTICS [dbo].[table1] ([stats_col1]);

通过更新特定统计信息对象,可以减少管理统计信息所需的时间和资源。By updating specific statistics objects, you can minimize the time and resources required to manage statistics. 这需要经过一定的思考,以选择要更新的最佳统计信息对象。Doing so requires some thought to choose the best statistics objects to update.

更新表的所有统计信息Update all statistics on a table

用于更新表中所有统计信息对象的一个简单方法如下:A simple method for updating all the statistics objects on a table is:

UPDATE STATISTICS [schema_name].[table_name];

例如:For example:

UPDATE STATISTICS dbo.table1;

UPDATE STATISTICS 语句很容易使用。The UPDATE STATISTICS statement is easy to use. 只要记住,这会更新表中的所有统计信息,因此执行的工作可能会超过所需的数量。 Just remember that it updates all statistics on the table, and therefore might perform more work than is necessary. 如果性能不是一个考虑因素,这是保证拥有最新统计信息的最简单、最全面的操作方式。If performance is not an issue, this is the easiest and most complete way to guarantee that statistics are up to date.

备注

更新表中的所有统计信息时,SQL 池将执行扫描,以针对每个统计信息对象进行表采样。When updating all statistics on a table, SQL pool does a scan to sample the table for each statistics object. 如果表很大、包含许多列和许多统计信息,则根据需要更新各项统计信息可能比较有效率。If the table is large and has many columns and many statistics, it might be more efficient to update individual statistics based on need.

有关 UPDATE STATISTICS 过程的实现,请参阅临时表For an implementation of an UPDATE STATISTICS procedure, see Temporary Tables. 实现方法与上述 CREATE STATISTICS 过程略有不同,但最终结果相同。The implementation method is slightly different from the preceding CREATE STATISTICS procedure, but the result is the same.

有关完整语法,请参阅更新统计信息For the full syntax, see Update Statistics.

统计信息元数据Statistics metadata

可以使用多个系统视图和函数来查找有关统计信息的信息。There are several system views and functions that you can use to find information about statistics. 例如,使用 stats-date 函数查看上次创建或更新统计信息的时间,即可了解统计信息对象是否可能已过时。For example, you can see if a statistics object might be out of date by using the stats-date function to see when statistics were last created or updated.

统计信息的目录视图Catalog views for statistics

这些系统视图提供有关统计信息的信息:These system views provide information about statistics:

目录视图Catalog view 说明Description
sys.columnssys.columns 针对每个列提供一行。One row for each column.
sys.objectssys.objects 针对数据库中的每个对象提供一行。One row for each object in the database.
sys.schemassys.schemas 针对数据库中的每个架构提供一行。One row for each schema in the database.
sys.statssys.stats 针对每个统计信息对象提供一行。One row for each statistics object.
sys.stats_columnssys.stats_columns 针对统计信息对象中的每个列提供一行。One row for each column in the statistics object. 链接回到 sys.columns。Links back to sys.columns.
sys.tablessys.tables 针对每个表(包括外部表)提供一行。One row for each table (includes external tables).
sys.table_typessys.table_types 针对每个数据类型提供一行。One row for each data type.

统计信息的系统函数System functions for statistics

这些系统函数适合用于处理统计信息:These system functions are useful for working with statistics:

系统函数System function 说明Description
STATS_DATESTATS_DATE 上次更新统计信息对象的日期。Date the statistics object was last updated.
DBCC SHOW_STATISTICSDBCC SHOW_STATISTICS 有关统计信息对象识别的值分布的摘要级别和详细信息。Summary level and detailed information about the distribution of values as understood by the statistics object.

将统计信息列和函数合并成一个视图Combine statistics columns and functions into one view

此视图将统计信息相关的列以及 STATS_DATE() 函数的结果合并在一起。This view brings columns that relate to statistics and results from the STATS_DATE() function together.

CREATE VIEW dbo.vstats_columns
AS
SELECT
        sm.[name]                           AS [schema_name]
,       tb.[name]                           AS [table_name]
,       st.[name]                           AS [stats_name]
,       st.[filter_definition]              AS [stats_filter_definition]
,       st.[has_filter]                     AS [stats_is_filtered]
,       STATS_DATE(st.[object_id],st.[stats_id])
                                            AS [stats_last_updated_date]
,       co.[name]                           AS [stats_column_name]
,       ty.[name]                           AS [column_type]
,       co.[max_length]                     AS [column_max_length]
,       co.[precision]                      AS [column_precision]
,       co.[scale]                          AS [column_scale]
,       co.[is_nullable]                    AS [column_is_nullable]
,       co.[collation_name]                 AS [column_collation_name]
,       QUOTENAME(sm.[name])+'.'+QUOTENAME(tb.[name])
                                            AS two_part_name
,       QUOTENAME(DB_NAME())+'.'+QUOTENAME(sm.[name])+'.'+QUOTENAME(tb.[name])
                                            AS three_part_name
FROM    sys.objects                         AS ob
JOIN    sys.stats           AS st ON    ob.[object_id]      = st.[object_id]
JOIN    sys.stats_columns   AS sc ON    st.[stats_id]       = sc.[stats_id]
                            AND         st.[object_id]      = sc.[object_id]
JOIN    sys.columns         AS co ON    sc.[column_id]      = co.[column_id]
                            AND         sc.[object_id]      = co.[object_id]
JOIN    sys.types           AS ty ON    co.[user_type_id]   = ty.[user_type_id]
JOIN    sys.tables          AS tb ON  co.[object_id]        = tb.[object_id]
JOIN    sys.schemas         AS sm ON  tb.[schema_id]        = sm.[schema_id]
WHERE   1=1
AND     st.[user_created] = 1
;

DBCC SHOW_STATISTICS() 示例DBCC SHOW_STATISTICS() examples

DBCC SHOW_STATISTICS() 显示统计信息对象中保存的数据。DBCC SHOW_STATISTICS() shows the data held within a statistics object. 这些数据包括三个组成部分:This data comes in three parts:

  • 标头Header
  • 密度矢量Density vector
  • 直方图Histogram

有关统计信息的标头元数据。The header metadata about the statistics. 直方图显示统计信息对象的第一个键列中的值分布。The histogram displays the distribution of values in the first key column of the statistics object. 密度向量可度量跨列相关性。The density vector measures cross-column correlation.

备注

SQL 池使用统计信息对象中的任何数据来计算基数估计值。SQL pool computes cardinality estimates with any of the data in the statistics object.

显示标头、密度和直方图Show header, density, and histogram

此简单示例显示了统计信息对象的所有三个组成部分:This simple example shows all three parts of a statistics object:

DBCC SHOW_STATISTICS([<schema_name>.<table_name>],<stats_name>)

例如:For example:

DBCC SHOW_STATISTICS (dbo.table1, stats_col1);

显示 DBCC SHOW_STATISTICS() 的一个或多个组成部分Show one or more parts of DBCC SHOW_STATISTICS()

如果只想要查看特定部分,请使用 WITH 子句并指定要查看哪些部分:If you're only interested in viewing specific parts, use the WITH clause and specify which parts you want to see:

DBCC SHOW_STATISTICS([<schema_name>.<table_name>],<stats_name>) WITH stat_header, histogram, density_vector

例如:For example:

DBCC SHOW_STATISTICS (dbo.table1, stats_col1) WITH histogram, density_vector

DBCC SHOW_STATISTICS() 差异DBCC SHOW_STATISTICS() differences

相比于 SQL Server,在 SQL 池中,DBCC SHOW_STATISTICS() 的实现更加严格。DBCC SHOW_STATISTICS() is more strictly implemented in SQL pool compared to SQL Server:

  • 未阐述的功能不受支持。Undocumented features are not supported.
  • 不能使用 Stats_stream。Cannot use Stats_stream.
  • 不能联接特定统计信息子集的结果。Cannot join results for specific subsets of statistics data. 例如 STAT_HEADER JOIN DENSITY_VECTOR。For example, STAT_HEADER JOIN DENSITY_VECTOR.
  • 不能针对消息隐藏设置 NO_INFOMSGS。NO_INFOMSGS cannot be set for message suppression.
  • 不能在统计信息名称的前后使用方括号。Square brackets around statistics names cannot be used.
  • 不能使用列名来标识统计信息对象。Cannot use column names to identify statistics objects.
  • 不支持自定义错误 2767。Custom error 2767 is not supported.

后续步骤Next steps

有关进一步提升查询性能的信息,请参阅监视工作负荷For further improve query performance, see Monitor your workload