提示Hints

建议生成执行计划的特定方法。Suggest specific approaches to generate an execution plan.

语法Syntax

/*+ hint [ , ... ] */

分区提示Partitioning hints

通过分区提示,可建议 Databricks Runtime 应遵循的分区策略。Partitioning hints allow you to suggest a partitioning strategy that Databricks Runtime should follow. 支持 COALESCEREPARTITIONREPARTITION_BY_RANGE 提示,它们分别等效于数据集 API coalescerepartitionrepartitionByRangeCOALESCE, REPARTITION, and REPARTITION_BY_RANGE hints are supported and are equivalent to coalesce, repartition, and repartitionByRange Dataset APIs, respectively. 通过这些提示,你可优化性能并控制输出文件的数量。These hints give you a way to tune performance and control the number of output files. 如果指定了多个分区提示,则会在逻辑计划中插入多个节点,但优化器会选取最左侧的提示。When multiple partitioning hints are specified, multiple nodes are inserted into the logical plan, but the leftmost hint is picked by the optimizer.

分区提示类型Partitioning hint types

  • COALESCECOALESCE

    将分区数减少到指定的分区数。Reduce the number of partitions to the specified number of partitions. 它采用分区数作为参数。It takes a partition number as a parameter.

  • REPARTITIONREPARTITION

    使用指定的分区表达式将分区数调整为指定数量。Repartition to the specified number of partitions using the specified partitioning expressions. 它采用分区数和/或列名称作为参数。It takes a partition number, column names, or both as parameters.

  • REPARTITION_BY_RANGEREPARTITION_BY_RANGE

    使用指定的分区表达式将分区数调整为指定数量。Repartition to the specified number of partitions using the specified partitioning expressions. 它采用列名称和分区数作为参数,其中分区数是可选项。It takes column names and an optional partition number as parameters.

示例Examples

SELECT /*+ COALESCE(3) */ * FROM t;

SELECT /*+ REPARTITION(3) */ * FROM t;

SELECT /*+ REPARTITION(c) */ * FROM t;

SELECT /*+ REPARTITION(3, c) */ * FROM t;

SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t;

SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t;

-- multiple partitioning hints
EXPLAIN EXTENDED SELECT /*+ REPARTITION(100), COALESCE(500), REPARTITION_BY_RANGE(3, c) */ * FROM t;
== Parsed Logical Plan ==
'UnresolvedHint REPARTITION, [100]
+- 'UnresolvedHint COALESCE, [500]
   +- 'UnresolvedHint REPARTITION_BY_RANGE, [3, 'c]
      +- 'Project [*]
         +- 'UnresolvedRelation [t]

== Analyzed Logical Plan ==
name: string, c: int
Repartition 100, true
+- Repartition 500, false
   +- RepartitionByExpression [c#30 ASC NULLS FIRST], 3
      +- Project [name#29, c#30]
         +- SubqueryAlias spark_catalog.default.t
            +- Relation[name#29,c#30] parquet

== Optimized Logical Plan ==
Repartition 100, true
+- Relation[name#29,c#30] parquet

== Physical Plan ==
Exchange RoundRobinPartitioning(100), false, [id=#121]
+- *(1) ColumnarToRow
   +- FileScan parquet default.t[name#29,c#30] Batched: true, DataFilters: [], Format: Parquet,
      Location: CatalogFileIndex[file:/spark/spark-warehouse/t], PartitionFilters: [],
      PushedFilters: [], ReadSchema: struct<name:string>

联接提示Join hints

通过联接提示,可建议 Databricks Runtime 应使用的联接策略。Join hints allow you to suggest the join strategy that Databricks Runtime should use. 如果在联接的两端指定了不同的联接策略提示,Databricks Runtime 将按以下顺序设置提示优先级:BROADCAST 高于 MERGE 高于 SHUFFLE_HASH 高于 SHUFFLE_REPLICATE_NLWhen different join strategy hints are specified on both sides of a join, Databricks Runtime prioritizes hints in the following order: BROADCAST over MERGE over SHUFFLE_HASH over SHUFFLE_REPLICATE_NL. 如果两端都指定有 BROADCAST 提示或 SHUFFLE_HASH 提示,Databricks Runtime 会根据联接类型和关系大小选取生成端。When both sides are specified with the BROADCAST hint or the SHUFFLE_HASH hint, Databricks Runtime picks the build side based on the join type and the sizes of the relations. 由于给定的策略可能不支持部分联接类型,因此不保证 Databricks Runtime 使用提示建议的联接策略。Since a given strategy may not support all join types, Databricks Runtime is not guaranteed to use the join strategy suggested by the hint.

联接提示类型Join hint types

  • BROADCASTBROADCAST

    使用广播联接。Use broadcast join. 无论 autoBroadcastJoinThreshold 如何,都将广播带有提示的联接端。The join side with the hint is broadcast regardless of autoBroadcastJoinThreshold. 如果联接的两端都具有广播提示,则广播较小的一端(根据统计信息确定)。If both sides of the join have the broadcast hints, the one with the smaller size (based on stats) is broadcast. BROADCAST 的别名为 BROADCASTJOINMAPJOINThe aliases for BROADCAST are BROADCASTJOIN and MAPJOIN.

  • MERGEMERGE

    使用随机排序合并联接。Use shuffle sort merge join. MERGE 的别名为 SHUFFLE_MERGEMERGEJOINThe aliases for MERGE are SHUFFLE_MERGE and MERGEJOIN.

  • SHUFFLE_HASHSHUFFLE_HASH

    使用随机哈希联接。Use shuffle hash join. 如果两端都有随机哈希提示,Databricks Runtime 会选择较小的一端作为生成端(根据统计信息确定)。If both sides have the shuffle hash hints, Databricks Runtime chooses the smaller side (based on stats) as the build side.

  • SHUFFLE_REPLICATE_NLSHUFFLE_REPLICATE_NL

    使用随机复制嵌套循环联接。Use shuffle-and-replicate nested loop join.

示例Examples

-- Join Hints for broadcast join
SELECT /*+ BROADCAST(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
SELECT /*+ BROADCASTJOIN (t1) */ * FROM t1 left JOIN t2 ON t1.key = t2.key;
SELECT /*+ MAPJOIN(t2) */ * FROM t1 right JOIN t2 ON t1.key = t2.key;

-- Join Hints for shuffle sort merge join
SELECT /*+ SHUFFLE_MERGE(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
SELECT /*+ MERGEJOIN(t2) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
SELECT /*+ MERGE(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;

-- Join Hints for shuffle hash join
SELECT /*+ SHUFFLE_HASH(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;

-- Join Hints for shuffle-and-replicate nested loop join
SELECT /*+ SHUFFLE_REPLICATE_NL(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;

-- When different join strategy hints are specified on both sides of a join, Databricks Runtime
-- prioritizes the BROADCAST hint over the MERGE hint over the SHUFFLE_HASH hint
-- over the SHUFFLE_REPLICATE_NL hint.
-- Databricks Runtime will issue Warning in the following example
-- org.apache.spark.sql.catalyst.analysis.HintErrorLogger: Hint (strategy=merge)
-- is overridden by another hint and will not take effect.
SELECT /*+ BROADCAST(t1), MERGE(t1, t2) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;

倾斜提示Skew hints

(Azure Databricks 上的 Delta Lake)有关 SKEW 提示的详细信息,请参阅倾斜联接优化(Delta Lake on Azure Databricks) See Skew Join optimization for information about the SKEW hint.