适用于:✅Azure 数据资源管理器
bag_unpack
插件通过将每个属性包顶级槽视为列来解压缩 dynamic
类型的单个列。 该插件通过 evaluate
运算符调用。
Syntax
T|
evaluate
bag_unpack(
Column [,
OutputColumnPrefix ] [,
columnsConflict ] [,
ignoredProperties ] )
[:
OutputSchema]
Learn more about syntax conventions.
Parameters
Name | 类型 | Required | Description |
---|---|---|---|
T | string |
✔️ | The tabular input whose column Column is to be unpacked. |
Column | dynamic |
✔️ | The column of T to unpack. |
OutputColumnPrefix | string |
添加到插件生成的所有列的常见前缀。 | |
columnsConflict | string |
有关解决列冲突的指导。 Valid values: error - 查询生成错误(默认值)replace_source -源列被替换keep_source - 源列被保留 |
|
ignoredProperties | dynamic |
要忽略的一组可选包属性。 } | |
OutputSchema | 指定插件输出的 bag_unpack 列名称和类型。 有关语法信息,请参阅 输出架构语法,并了解含义,请参阅 性能注意事项。 |
输出架构语法
(
ColumnName:
ColumnType [,
...] )
使用通配符 *
作为第一个参数将源表的所有列包含在输出中,如下所示:
(
*
,
ColumnName:
ColumnType [,
...] )
Performance considerations
Using the plugin without an OutputSchema can have severe performance implications in large datasets and should be avoided.
Providing an OutputSchema allows the query engine to optimize the query execution, as it can determine the output schema without needing to parse and analyze the input data. 当输入数据很大或很复杂时,这非常有用。 请参阅 示例,其中显示了 在未定义 OutputSchema 的情况下使用插件的性能影响。
Returns
The bag_unpack
plugin returns a table with as many records as its tabular input (T). 该表格的架构与表格输入的架构相同,但有以下修改:
- The specified input column (Column) is removed.
- The name of each column corresponds to the name of each slot, optionally prefixed by OutputColumnPrefix.
- 如果同一槽的所有值具有相同类型,则每个列的类型都是槽的类型,或者
dynamic
,如果值在类型上不同。 - The schema is extended with as many columns as there are distinct slots in the top-level property bag values of T.
Note
- If you don't specify the OutputSchema, the plugin output schema varies based on the input data values. 具有不同数据输入的插件的多个执行可以生成不同的输出架构。
- If an OutputSchema is specified, the plugin returns only the columns defined in the Output schema syntax, unless a wildcard
*
is used. - To return all columns of the input data, and the columns defined in the OutputSchema, use a wildcard
*
in the OutputSchema.
表格架构规则适用于输入数据。 In particular:
- An output column name can't be the same as an existing column in the tabular input T, unless it's the column to unpack (Column). 否则,输出包含两个具有相同名称的列。
- All slot names, when prefixed by OutputColumnPrefix, must be valid entity names and follow the identifier naming rules.
该插件忽略 null 值。
Examples
本节中的示例演示如何使用语法帮助你入门。
展开包:
datatable(d:dynamic)
[
dynamic({"Name": "John", "Age":20}),
dynamic({"Name": "Dave", "Age":40}),
dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d)
Output
Age | Name |
---|---|
20 | John |
40 | Dave |
30 | Jasmine |
展开包并使用 OutputColumnPrefix
选项生成带有前缀的列名称:
datatable(d:dynamic)
[
dynamic({"Name": "John", "Age":20}),
dynamic({"Name": "Dave", "Age":40}),
dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d, 'Property_')
Output
Property_Age | Property_Name |
---|---|
20 | John |
40 | Dave |
30 | Jasmine |
展开包并使用 columnsConflict
选项解决动态列与现有列之间的列冲突:
datatable(Name:string, d:dynamic)
[
'Old_name', dynamic({"Name": "John", "Age":20}),
'Old_name', dynamic({"Name": "Dave", "Age":40}),
'Old_name', dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d, columnsConflict='replace_source') // Use new name
Output
Age | Name |
---|---|
20 | John |
40 | Dave |
30 | Jasmine |
datatable(Name:string, d:dynamic)
[
'Old_name', dynamic({"Name": "John", "Age":20}),
'Old_name', dynamic({"Name": "Dave", "Age":40}),
'Old_name', dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d, columnsConflict='keep_source') // Keep old name
Output
Age | Name |
---|---|
20 | Old_name |
40 | Old_name |
30 | Old_name |
展开包并使用 ignoredProperties
选项忽略属性包中的 2 个属性:
datatable(d:dynamic)
[
dynamic({"Name": "John", "Age":20, "Address": "Address-1" }),
dynamic({"Name": "Dave", "Age":40, "Address": "Address-2"}),
dynamic({"Name": "Jasmine", "Age":30, "Address": "Address-3"}),
]
// Ignore 'Age' and 'Address' properties
| evaluate bag_unpack(d, ignoredProperties=dynamic(['Address', 'Age']))
Output
Name |
---|
John |
Dave |
Jasmine |
展开包并使用 OutputSchema 选项:
datatable(d:dynamic)
[
dynamic({"Name": "John", "Age":20}),
dynamic({"Name": "Dave", "Age":40}),
dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d) : (Name:string, Age:long)
Output
Name | Age |
---|---|
John | 20 |
Dave | 40 |
Jasmine | 30 |
使用 OutputSchema 展开包并使用通配符 *
选项:
This query returns the original slot Description and the columns defined in the OutputSchema.
datatable(d:dynamic, Description: string)
[
dynamic({"Name": "John", "Age":20, "height":180}), "Student",
dynamic({"Name": "Dave", "Age":40, "height":160}), "Teacher",
dynamic({"Name": "Jasmine", "Age":30, "height":172}), "Student",
]
| evaluate bag_unpack(d) : (*, Name:string, Age:long)
Output
Description | Name | Age |
---|---|---|
学生 | John | 20 |
Teacher | Dave | 40 |
学生 | Jasmine | 30 |
性能影响的示例
使用和不使用定义的 OutputSchema 扩展包以比较性能影响:
This example uses a publicly available table in the help cluster. In the ContosoSales database, there's a table called SalesDynamic. The table contains sales data and includes a dynamic column named Customer_Properties.
没有输出架构的示例:第一个查询不定义 OutputSchema。 The query takes 5.84 seconds of CPU and scans 36.39 MB of data.
SalesDynamic | evaluate bag_unpack(Customer_Properties) | summarize Sales=sum(SalesAmount) by Country, State
输出架构示例:第二个查询提供 OutputSchema。 The query takes 0.45 seconds of CPU and scans 19.31 MB of data. 查询不必分析输入表,从而节省处理时间。
SalesDynamic | evaluate bag_unpack(Customer_Properties) : (*, Country:string, State:string, City:string) | summarize Sales=sum(SalesAmount) by Country, State
Output
这两个查询的输出相同。 输出的前 10 行如下所示。
Canada | British Columbia | 56,101,083 |
---|---|---|
United Kingdom | England | 77,288,747 |
Australia | Victoria | 31,242,423 |
Australia | Queensland | 27,617,822 |
Australia | South Australia | 8,530,537 |
Australia | 新南威尔士州 | 54,765,786 |
Australia | Tasmania | 3,704,648 |
Canada | Alberta | 375,061 |
Canada | Ontario | 38,282 |
美国 | Washington | 80,544,870 |
... | ... | ... |