ANALYZE TABLEANALYZE TABLE

收集有关表的统计信息,查询优化器可以使用该表查找更好的查询执行计划。Collects statistics about the table to be used by the query optimizer to find a better query execution plan.

语法Syntax

ANALYZE TABLE table_identifier [ partition_spec ]
    COMPUTE STATISTICS [ NOSCAN | FOR COLUMNS col [ , ... ] | FOR ALL COLUMNS ]

参数Parameters

  • table_identifiertable_identifier

    • [database_name.] table_name:表名,可选择使用数据库名称进行限定。[database_name.] table_name: A table name, optionally qualified with a database name.
    • delta.`<path-to-table>`:现有 Delta 表的位置。delta.`<path-to-table>`: The location of an existing Delta table.
  • partition_specpartition_spec

    一个可选参数,用于指定分区键值对的逗号分隔列表。An optional parameter that specifies a comma-separated list of key-value pairs for partitions. 如果指定,则返回分区统计信息。When specified, partition statistics are returned.

    语法: PARTITION ( partition_col_name [ = partition_col_val ] [ , ... ] )Syntax: PARTITION ( partition_col_name [ = partition_col_val ] [ , ... ] )

  • [ NOSCAN | FOR COLUMNS col [ , … ] | FOR ALL COLUMNS ][ NOSCAN | FOR COLUMNS col [ , … ] | FOR ALL COLUMNS ]

    如果未指定分析选项,ANALYZE TABLE 将收集表的行数和大小(以字节为单位)。If no analyze option is specified, ANALYZE TABLE collects the table’s number of rows and size in bytes.

    • NOSCANNOSCAN

      仅收集表的大小(以字节为单位)(不需要扫描整个表)。Collect only the table’s size in bytes ( which does not require scanning the entire table ).

    • FOR COLUMNS col [ , … ] | FOR ALL COLUMNSFOR COLUMNS col [ , … ] | FOR ALL COLUMNS

      收集每个指定列或每列的列统计信息,以及表统计信息。Collect column statistics for each column specified, or alternatively for every column, as well as table statistics.

示例Examples

CREATE TABLE students (name STRING, student_id INT) PARTITIONED BY (student_id);
INSERT INTO students PARTITION (student_id = 111111) VALUES ('Mark');
INSERT INTO students PARTITION (student_id = 222222) VALUES ('John');

ANALYZE TABLE students COMPUTE STATISTICS NOSCAN;

DESC EXTENDED students;
+--------------------+--------------------+-------+
|            col_name|           data_type|comment|
+--------------------+--------------------+-------+
|                name|              string|   null|
|          student_id|                 int|   null|
|                 ...|                 ...|    ...|
|          Statistics|           864 bytes|       |
|                 ...|                 ...|    ...|
|  Partition Provider|             Catalog|       |
+--------------------+--------------------+-------+

ANALYZE TABLE students COMPUTE STATISTICS;

DESC EXTENDED students;
+--------------------+--------------------+-------+
|            col_name|           data_type|comment|
+--------------------+--------------------+-------+
|                name|              string|   null|
|          student_id|                 int|   null|
|                 ...|                 ...|    ...|
|          Statistics|   864 bytes, 2 rows|       |
|                 ...|                 ...|    ...|
|  Partition Provider|             Catalog|       |
+--------------------+--------------------+-------+

ANALYZE TABLE students PARTITION (student_id = 111111) COMPUTE STATISTICS;

DESC EXTENDED students PARTITION (student_id = 111111);
+--------------------+--------------------+-------+
|            col_name|           data_type|comment|
+--------------------+--------------------+-------+
|                name|              string|   null|
|          student_id|                 int|   null|
|                 ...|                 ...|    ...|
|Partition Statistics|   432 bytes, 1 rows|       |
|                 ...|                 ...|    ...|
|        OutputFormat|org.apache.hadoop...|       |
+--------------------+--------------------+-------+

ANALYZE TABLE students COMPUTE STATISTICS FOR COLUMNS name;

DESC EXTENDED students name;
+--------------+----------+
|     info_name|info_value|
+--------------+----------+
|      col_name|      name|
|     data_type|    string|
|       comment|      NULL|
|           min|      NULL|
|           max|      NULL|
|     num_nulls|         0|
|distinct_count|         2|
|   avg_col_len|         4|
|   max_col_len|         4|
|     histogram|      NULL|
+--------------+----------+