bag_unpack plugin

适用于:✅Azure 数据资源管理器

bag_unpack 插件通过将每个属性包顶级槽视为列来解压缩 dynamic 类型的单个列。 该插件通过 evaluate 运算符调用。

Syntax

T|evaluatebag_unpack(Column [,OutputColumnPrefix ] [,columnsConflict ] [,ignoredProperties ] ) [:OutputSchema]

Learn more about syntax conventions.

Parameters

Name 类型 Required Description
T string ✔️ The tabular input whose column Column is to be unpacked.
Column dynamic ✔️ The column of T to unpack.
OutputColumnPrefix string 添加到插件生成的所有列的常见前缀。
columnsConflict string 有关解决列冲突的指导。 Valid values:
error - 查询生成错误(默认值)
replace_source -源列被替换
keep_source - 源列被保留
ignoredProperties dynamic 要忽略的一组可选包属性。 }
OutputSchema 指定插件输出的 bag_unpack 列名称和类型。 有关语法信息,请参阅 输出架构语法,并了解含义,请参阅 性能注意事项

输出架构语法

( ColumnName:ColumnType [, ...] )

使用通配符 * 作为第一个参数将源表的所有列包含在输出中,如下所示:

( * , ColumnName:ColumnType [, ...] )

Performance considerations

Using the plugin without an OutputSchema can have severe performance implications in large datasets and should be avoided.

Providing an OutputSchema allows the query engine to optimize the query execution, as it can determine the output schema without needing to parse and analyze the input data. 当输入数据很大或很复杂时,这非常有用。 请参阅 示例,其中显示了 在未定义 OutputSchema 的情况下使用插件的性能影响。

Returns

The bag_unpack plugin returns a table with as many records as its tabular input (T). 该表格的架构与表格输入的架构相同,但有以下修改:

  • The specified input column (Column) is removed.
  • The name of each column corresponds to the name of each slot, optionally prefixed by OutputColumnPrefix.
  • 如果同一槽的所有值具有相同类型,则每个列的类型都是槽的类型,或者 dynamic,如果值在类型上不同。
  • The schema is extended with as many columns as there are distinct slots in the top-level property bag values of T.

Note

  • If you don't specify the OutputSchema, the plugin output schema varies based on the input data values. 具有不同数据输入的插件的多个执行可以生成不同的输出架构。
  • If an OutputSchema is specified, the plugin returns only the columns defined in the Output schema syntax, unless a wildcard * is used.
  • To return all columns of the input data, and the columns defined in the OutputSchema, use a wildcard * in the OutputSchema.

表格架构规则适用于输入数据。 In particular:

  • An output column name can't be the same as an existing column in the tabular input T, unless it's the column to unpack (Column). 否则,输出包含两个具有相同名称的列。
  • All slot names, when prefixed by OutputColumnPrefix, must be valid entity names and follow the identifier naming rules.

该插件忽略 null 值。

Examples

本节中的示例演示如何使用语法帮助你入门。

展开包

datatable(d:dynamic)
[
    dynamic({"Name": "John", "Age":20}),
    dynamic({"Name": "Dave", "Age":40}),
    dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d)

Output

Age Name
20 John
40 Dave
30 Jasmine

展开包并使用 OutputColumnPrefix 选项生成带有前缀的列名称

datatable(d:dynamic)
[
    dynamic({"Name": "John", "Age":20}),
    dynamic({"Name": "Dave", "Age":40}),
    dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d, 'Property_')

Output

Property_Age Property_Name
20 John
40 Dave
30 Jasmine

展开包并使用 columnsConflict 选项解决动态列与现有列之间的列冲突

datatable(Name:string, d:dynamic)
[
    'Old_name', dynamic({"Name": "John", "Age":20}),
    'Old_name', dynamic({"Name": "Dave", "Age":40}),
    'Old_name', dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d, columnsConflict='replace_source') // Use new name

Output

Age Name
20 John
40 Dave
30 Jasmine
datatable(Name:string, d:dynamic)
[
    'Old_name', dynamic({"Name": "John", "Age":20}),
    'Old_name', dynamic({"Name": "Dave", "Age":40}),
    'Old_name', dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d, columnsConflict='keep_source') // Keep old name

Output

Age Name
20 Old_name
40 Old_name
30 Old_name

展开包并使用 ignoredProperties 选项忽略属性包中的 2 个属性

datatable(d:dynamic)
[
    dynamic({"Name": "John", "Age":20, "Address": "Address-1" }),
    dynamic({"Name": "Dave", "Age":40, "Address": "Address-2"}),
    dynamic({"Name": "Jasmine", "Age":30, "Address": "Address-3"}),
]
// Ignore 'Age' and 'Address' properties
| evaluate bag_unpack(d, ignoredProperties=dynamic(['Address', 'Age']))

Output

Name
John
Dave
Jasmine

展开包并使用 OutputSchema 选项

datatable(d:dynamic)
[
    dynamic({"Name": "John", "Age":20}),
    dynamic({"Name": "Dave", "Age":40}),
    dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d) : (Name:string, Age:long)

Output

Name Age
John 20
Dave 40
Jasmine 30

使用 OutputSchema 展开包并使用通配符 * 选项

This query returns the original slot Description and the columns defined in the OutputSchema.

datatable(d:dynamic, Description: string)
[
    dynamic({"Name": "John", "Age":20, "height":180}), "Student",
    dynamic({"Name": "Dave", "Age":40, "height":160}), "Teacher",
    dynamic({"Name": "Jasmine", "Age":30, "height":172}), "Student",
]
| evaluate bag_unpack(d) : (*, Name:string, Age:long)

Output

Description Name Age
学生 John 20
Teacher Dave 40
学生 Jasmine 30

性能影响的示例

使用和不使用定义的 OutputSchema 扩展包以比较性能影响

This example uses a publicly available table in the help cluster. In the ContosoSales database, there's a table called SalesDynamic. The table contains sales data and includes a dynamic column named Customer_Properties.

SalesDynamic 表的屏幕截图,其中突出显示了“客户属性”列。

  • 没有输出架构的示例:第一个查询不定义 OutputSchema。 The query takes 5.84 seconds of CPU and scans 36.39 MB of data.

    SalesDynamic
    | evaluate bag_unpack(Customer_Properties) 
    | summarize Sales=sum(SalesAmount) by Country, State
    
  • 输出架构示例:第二个查询提供 OutputSchema。 The query takes 0.45 seconds of CPU and scans 19.31 MB of data. 查询不必分析输入表,从而节省处理时间。

    SalesDynamic
    | evaluate bag_unpack(Customer_Properties) :  (*, Country:string, State:string, City:string)
    | summarize Sales=sum(SalesAmount) by Country, State
    

Output

这两个查询的输出相同。 输出的前 10 行如下所示。

Canada British Columbia 56,101,083
United Kingdom England 77,288,747
Australia Victoria 31,242,423
Australia Queensland 27,617,822
Australia South Australia 8,530,537
Australia 新南威尔士州 54,765,786
Australia Tasmania 3,704,648
Canada Alberta 375,061
Canada Ontario 38,282
美国 Washington 80,544,870
... ... ...