partition operator

Article
02/07/2024

The partition operator partitions the records of its input table into multiple subtables according to values in a key column. The operator runs a subquery on each subtable, and produces a single output table that is the union of the results of all subqueries.

This operator is useful when you need to perform a subquery only on a subset of rows that belongs to the same partition key, and not query the whole dataset. These subqueries could include aggregate functions, window functions, top N and others.

The partition operator supports several strategies of subquery operation:

Native - use with an implicit data source with thousands of key partition values.
Shuffle - use with an implicit source with millions of key partition values.
Legacy - use with an implicit or explicit source for 64 or less key partition values.

Syntax

T | partition [ hint.strategy=Strategy ] [ Hints ] by Column ( TransformationSubQuery )

T | partition [ hint.strategy=legacy ] [ Hints ] by Column { SubQueryWithSource }

Learn more about syntax conventions.

Parameters

Name	Type	Required	Description
T	`string`	✔️	The input tabular source.
Strategy	`string`		The value `legacy`, `shuffle`, or `native`. This hint defines the execution strategy of the partition operator. If no strategy is specified, the `legacy` strategy is used. For more information, see Strategies.
Column	`string`	✔️	The name of a column in T whose values determine how to partition the input tabular source.
TransformationSubQuery	`string`	✔️	A tabular transformation expression. The source is implicitly the subtables produced by partitioning the records of T. Each subtable is homogenous on the value of Column. The expression must provide only one tabular result and shouldn't have other types of statements, such as `let` statements.
SubQueryWithSource	`string`	✔️	A tabular expression that includes its own tabular source, such as a table reference. This syntax is only supported with the legacy strategy. The subquery can only reference the key column, Column, from T. To reference the column, use the syntax `toscalar(`Column`)`. The expression must provide only one tabular result and shouldn't have other types of statements, such as `let` statements.
Hints	`string`		Zero or more space-separated parameters in the form of: HintName `=` Value that control the behavior of the operator. See the supported hints per strategy type.

Supported hints

Hint name	Type	Strategy	Description
`hint.shufflekey`	`string`	shuffle	The partition key used to run the partition operator with the `shuffle` strategy.
`hint.materialized`	`bool`	legacy	If set to `true`, will materialize the source of the `partition` operator. The default value is `false`.
`hint.concurrency`	`int`	legacy	Determines how many partitions to run in parallel. The default value is `16`.
`hint.spread`	`int`	legacy	Determines how to distribute the partitions among cluster nodes. The default value is `1`. For example, if there are N partitions and the spread hint is set to P, then the N partitions will be processed by P different cluster nodes equally in parallel/sequentially depending on the concurrency hint.

Returns

The operator returns a union of the results of the individual subqueries.

Strategies

The partition operator supports several strategies of subquery operation: native, shuffle, and legacy.

Note

The distinction between the native and shuffle strategies allows the caller to indicate the cardinality and execution strategy of the subquery. This choice may affect how long the subquery takes to complete but doesn't change the end result.

Native strategy

This strategy should be applied when the number of distinct values of the partition key isn't large, roughly in the thousands.

The subquery must be a tabular transformation that doesn't specify a tabular source. The source is implicit and is assigned according to the subtable partitions. Only certain supported operators can be used in the subquery. There's no restriction on the number of partitions.

To use this strategy, specify hint.strategy=native.

Shuffle strategy

This strategy should be applied when the number of distinct values of the partition key is large, in the millions.

To use this strategy, specify hint.strategy=shuffle. For more information about shuffle strategy and performance, see shuffle query.

Supported operators for the native and shuffle strategies

The following list of operators can be used in subqueries with the native or shuffle strategies:

Note

Operators that reference a table source other than the subtable partitions aren't compatible with the native and shuffle strategies. For example, join, union, externaldata, and evaluate (plugins). For such scenarios, resort to the legacy strategy.
The fork operator isn't supported for any strategy type, as the subquery must return a single tabular result.

Legacy strategy

For historical reasons, the legacy strategy is the default strategy. However, we recommend favoring the native or shuffle strategies, as the legacy approach is limited to 64 partitions and is less efficient.

In some scenarios, the legacy strategy might be necessary due to its support for including a tabular source in the subquery. In such cases, the subquery can only reference the key column, Column, from the input tabular source, T. To reference the column, use the syntax toscalar(Column).

If the subquery is a tabular transformation without a tabular source, the source is implicit and is based on the subtable partitions.

To use this strategy, specify hint.strategy=legacy or omit any other strategy indication.

Note

An error will occur if the partition column, Column, contains more than 64 distinct values.

Examples

Find top values

In some cases, it's more performant and easier to write a query using the partition operator than using the top-nested operator. The following query runs a subquery calculating summarize and top for each State starting with W: "WYOMING", "WASHINGTON", "WEST VIRGINIA", and "WISCONSIN".

Run the query

StormEvents
| where State startswith 'W'
| partition hint.strategy=native by State 
    (
    summarize Events=count(), Injuries=sum(InjuriesDirect) by EventType, State
    | top 3 by Events 
    )

Output

EventType	State	Events	Injuries
Hail	WYOMING	108	0
High Wind	WYOMING	81	5
Winter Storm	WYOMING	72	0
Heavy Snow	WASHINGTON	82	0
High Wind	WASHINGTON	58	13
Wildfire	WASHINGTON	29	0
Thunderstorm Wind	WEST VIRGINIA	180	1
Hail	WEST VIRGINIA	103	0
Winter Weather	WEST VIRGINIA	88	0
Thunderstorm Wind	WISCONSIN	416	1
Winter Storm	WISCONSIN	310	0
Hail	WISCONSIN	303	1

Native strategy

The following query returns the top 2 EventType values by TotalInjuries for each State that starts with 'W':

Run the query

StormEvents
| where State startswith 'W'
| partition hint.strategy = native by State
    (
    summarize TotalInjueries = sum(InjuriesDirect) by EventType
    | top 2 by TotalInjueries
    )

Output

EventType	TotalInjueries
Tornado	4
Hail	1
Thunderstorm Wind	1
Excessive Heat	0
High Wind	13
Lightning	5
High Wind	5
Avalanche	3

Shuffle strategy

The following query returns the top 3 DamagedProperty values foreach EpisodeId and the columns EpisodeId and State.

Run the query

StormEvents
| partition hint.strategy=shuffle by EpisodeId
    (
    top 3 by DamageProperty
    | project EpisodeId, State, DamageProperty
    )
| count

Output

Count
22345

Legacy strategy with explicit source

The following query runs two subqueries:

When x == 1, the query returns all rows from StormEvents that have InjuriesIndirect == 1.
When x == 2, the query returns all rows from StormEvents that have InjuriesIndirect == 2.

The final result is the union of these two subqueries.

Run the query

range x from 1 to 2 step 1
| partition hint.strategy=legacy by x {StormEvents | where x == InjuriesIndirect}
| count

Output

Count
113

Partition reference

The following example shows how to use the as operator to give a "name" to each data partition and then reuse that name within the subquery. This approach is only relevant to the legacy strategy.

T
| partition by Dim
(
    as Partition
    | extend MetricPct = Metric * 100.0 / toscalar(Partition | summarize sum(Metric))
)

partition operator

Syntax

Parameters

Supported hints

Returns

Strategies

Native strategy

Shuffle strategy

Supported operators for the native and shuffle strategies

Legacy strategy

Examples

Find top values

Native strategy

Shuffle strategy

Legacy strategy with explicit source

Partition reference

Additional resources