数据映射Data mappings

数据映射在引入过程中使用,将传入数据映射到 Kusto 表中的列。Data mappings are used during ingestion to map incoming data to columns inside Kusto tables.

Kusto 支持不同类型的映射,包括 row-oriented(CSV、JSON 和 AVRO)和 column-oriented (Parquet)。Kusto supports different types of mappings, both row-oriented (CSV, JSON and AVRO), and column-oriented (Parquet).

映射列表中的每个元素都由以下三个属性构成:Each element in the mapping list is constructed from three properties:

属性Property 说明Description
column Kusto 表中的目标列名称Target column name in the Kusto table
datatype (可选)数据类型,如果 Kusto 表中不存在映射列时,则使用该数据类型创建映射列(Optional) Datatype with which to create the mapped column if it doesn't already exist in the Kusto table
Properties (可选)属性包,包含特定于每个映射的属性,如下一节中所述。(Optional) Property-bag containing properties specific for each mapping as described in each section below.

所有映射都可以是预先创建的,也可以是使用 ingestionMappingReference 参数从 ingest 命令引用的。All mappings can be pre-created and can be referenced from the ingest command using ingestionMappingReference parameters.

CSV 映射CSV mapping

如果源文件是 CSV(或任何分隔符分隔的格式),并且其架构与当前 Kusto 表架构不匹配,则 CSV 映射会将文件架构映射到 Kusto 表架构。When the source file is a CSV (or any delimeter-separated format) and its schema doesn't match the current Kusto table schema, a CSV mapping maps from the file schema to the Kusto table schema. 如果表在 Kusto 中不存在,则将根据此映射创建该表。If the table doesn't exist in Kusto, it will be created according to this mapping. 如果表中缺少映射中的某些字段,将添加这些字段。If some fields in the mapping are missing in the table, they will be added.

CSV 映射可以应用于所有分隔符分隔的格式:CSV、TSV、PSV、SCSV 和 SOHsv。CSV mapping can be applied on all the delimiter-separated formats: CSV, TSV, PSV, SCSV, and SOHsv.

列表中的每个元素都描述特定列的映射,并且可能包含以下属性:Each element in the list describes a mapping for a specific column, and may contain the following properties:

属性Property 说明Description
ordinal CSV 中的列顺序号The column order number in CSV
constantValue (可选)要用于列而不是 CSV 内某个值的常数值(Optional) The constant value to be used for a column instead of some value inside the CSV

备注

OrdinalConstantValue 互斥。Ordinal and ConstantValue are mutually exclusive.

CSV 映射示例Example of the CSV mapping

[
  { "column" : "rownumber", "Properties":{"Ordinal":"0"}},
  { "column" : "rowguid",   "Properties":{"Ordinal":"1"}},
  { "column" : "xdouble",   "Properties":{"Ordinal":"2"}},
  { "column" : "xbool",     "Properties":{"Ordinal":"3"}},
  { "column" : "xint32",    "Properties":{"Ordinal":"4"}},
  { "column" : "xint64",    "Properties":{"Ordinal":"5"}},
  { "column" : "xdate",     "Properties":{"Ordinal":"6"}},
  { "column" : "xtext",     "Properties":{"Ordinal":"7"}},
  { "column" : "const_val", "Properties":{"ConstValue":"Sample: constant value"}}
]

备注

当上述映射作为 .ingest 控制命令的一部分提供时,它将被序列化为 JSON 字符串。When the mapping above is provided as part of the .ingest control command it is serialized as a JSON string.

.ingest into Table123 (@"source1", @"source2")
    with 
    (
        format="csv", 
        ingestionMapping = 
        "["
            "{\"column\":\"rownumber\",\"Properties\":{\"Ordinal\":0}},"
            "{\"column\":\"rowguid\",  \"Properties\":{\"Ordinal\":1}}"
        "]" 
    )

备注

如果上述映射是预先创建的,可使用 .ingest 控制命令引用它:When the mapping above is pre-created it can be referenced in the .ingest control command:

.ingest into Table123 (@"source1", @"source2")
    with 
    (
        format="csv", 
        ingestionMappingReference = "Mapping1"
    )

备注

不推荐使用以下没有 Properties 属性包的映射格式。The following mapping format, without the Properties property-bag, is deprecated.

.ingest into Table123 (@"source1", @"source2")
    with 
    (
        format="csv", 
        ingestionMapping = 
        "["
            "{\"column\":\"rownumber\",\"Ordinal\": 0},"
            "{\"column\":\"rowguid\",  \"Ordinal\": 1}"
        "]" 
    )

JSON 映射JSON mapping

当源文件采用 JSON 格式时,文件内容将映射到 Kusto 表。When the source file is in JSON format, the file content is mapped to the Kusto table. 表必须存在于 Kusto 数据库中,除非为映射的所有列指定有效的数据类型。The table must exist in the Kusto database unless a valid datatype is specified for all the columns mapped. 在 JSON 映射中,被映射的列必须存在于 Kusto 表中,除非为所有不存在的列指定了数据类型。The columns mapped in the JSON mapping must exist in the Kusto table unless a datatype is specified for all the non-existing columns.

列表中的每个元素都描述特定列的映射,并且可能包含以下属性:Each element in the list describes a mapping for a specific column, and may contain the following properties:

属性Property 说明Description
path 如果开头为 $,指向将成为 JSON 文档中列内容的字段的 JSON 路径(指示整个文档的 JSON 路径为 $)。If starts with $: JSON path to the field that will become the content of the column in the JSON document (JSON path that denotes the entire document is $). 如果值不以 $ 开头,则使用常数值。If the value does not start with $: a constant value is used. 包含空格的 JSON 路径应作为 ['Property Name'] 进行转义。JSON paths that include white-spaces should be escaped as ['Property Name'].
transform (可选)应通过映射转换应用于内容的转换。(Optional) Transformation that should be applied on the content with mapping transformations.

JSON 映射示例Example of JSON mapping

[
  { "column" : "rownumber",   "Properties":{"Path":"$.rownumber"}}, 
  { "column" : "rowguid",     "Properties":{"Path":"$.rowguid"}}, 
  { "column" : "xdouble",     "Properties":{"Path":"$.xdouble"}}, 
  { "column" : "xbool",       "Properties":{"Path":"$.xbool"}}, 
  { "column" : "xint32",      "Properties":{"Path":"$.xint32"}}, 
  { "column" : "xint64",      "Properties":{"Path":"$.xint64"}}, 
  { "column" : "xdate",       "Properties":{"Path":"$.xdate"}}, 
  { "column" : "xtext",       "Properties":{"Path":"$.xtext"}}, 
  { "column" : "location",    "Properties":{"transform":"SourceLocation"}}, 
  { "column" : "lineNumber",  "Properties":{"transform":"SourceLineNumber"}}, 
  { "column" : "full_record", "Properties":{"Path":"$"}}
]

备注

当上述映射作为 .ingest 控制命令的一部分提供时,它将被序列化为 JSON 字符串。When the mapping above is provided as part of the .ingest control command it is serialized as JSON string.

.ingest into Table123 (@"source1", @"source2") 
  with 
  (
      format = "json", 
      ingestionMapping = 
      "["
        "{\"column\":\"rownumber\",\"Properties\":{\"Path\":\"$.rownumber\"}},"
        "{\"column\":\"rowguid\",  \"Properties\":{\"Path\":\"$.rowguid\"}}",
        "{\"column\":\"custom_column\",  \"Properties\":{\"Path\":\"$.[\'property name with space\']\"}}"
      "]"
  )

备注

如果上述映射是预先创建的,可使用 .ingest 控制命令引用它:When the mapping above is pre-created it can be referenced in the .ingest control command:

.ingest into Table123 (@"source1", @"source2")
    with 
    (
        format="json", 
        ingestionMappingReference = "Mapping1"
    )

备注

不推荐使用以下没有 Properties 属性包的映射格式。The following mapping format, without the Properties property-bag, is deprecated.

.ingest into Table123 (@"source1", @"source2") 
  with 
  (
      format = "json", 
      ingestionMapping = 
      "["
        "{\"column\":\"rownumber\",\"path\":\"$.rownumber\"},"
        "{\"column\":\"rowguid\",  \"path\":\"$.rowguid\"}"
      "]"
  )

Avro 映射Avro mapping

当源文件采用 Avro 格式时,Avro 文件内容将映射到 Kusto 表。When the source file is in Avro format, the Avro file content is mapped to the Kusto table. 表必须存在于 Kusto 数据库中,除非为映射的所有列指定有效的数据类型。The table must exist in the Kusto database unless a valid datatype is specified for all the columns mapped. 在 Avro 映射中,被映射的列必须存在于 Kusto 表中,除非为所有不存在的列指定了数据类型。The columns mapped in the Avro mapping must exist in the Kusto table unless a datatype is specified for all the non-existing columns.

列表中的每个元素都描述特定列的映射,并且可能包含以下属性:Each element in the list describes a mapping for a specific column, and may contain the following properties:

属性Property 说明Description
Field Avro 记录中字段的名称。The name of the field in the Avro record.
Path 此外还可使用 field,它允许采用 Avro 记录字段的内部部分(如有必要)。Alternative to using field which allows taking the inner part of an Avro record-field, if necessary. 该值表示从记录的根开始的 JSON 路径。The value denotes a JSON-path from the root of the record. 有关详细信息,请参阅以下备注。See the Notes below for more information. 包含空格的 JSON 路径应作为 ['Property Name'] 进行转义。JSON paths that include white-spaces should be escaped as ['Property Name'].
transform (可选)应通过支持的转换应用于内容的转换。(Optional) Transformation that should be applied on the content with supported transformations.

说明Notes

备注

  • fieldpath 不能一起使用,只允许使用一个。field and path cannot be used together,only one is allowed.
  • path 不能仅指向根 $,它必须具有至少一级路径。path cannot point to root $ only, it must have at least one level of path.

以下两种替代方法是等效的:The two alternatives below are equal:

[
  {"column": "rownumber", "Properties":{"Path":"$.RowNumber"}}
]
[
  {"column": "rownumber", "Properties":{"Field":"RowNumber"}}
]

AVRO 映射示例Example of the AVRO mapping

[
  {"column": "rownumber", "Properties":{"Field":"rownumber"}},
  {"column": "rowguid",   "Properties":{"Field":"rowguid"}},
  {"column": "xdouble",   "Properties":{"Field":"xdouble"}},
  {"column": "xboolean",  "Properties":{"Field":"xboolean"}},
  {"column": "xint32",    "Properties":{"Field":"xint32"}},
  {"column": "xint64",    "Properties":{"Field":"xint64"}},
  {"column": "xdate",     "Properties":{"Field":"xdate"}},
  {"column": "xtext",     "Properties":{"Field":"xtext"}}
]

备注

当上述映射作为 .ingest 控制命令的一部分提供时,它将被序列化为 JSON 字符串。When the mapping above is provided as part of the .ingest control command it is serialized as JSON string.

.ingest into Table123 (@"source1", @"source2") 
  with 
  (
      format = "avro", 
      ingestionMapping = 
      "["
        "{\"column\":\"rownumber\",\"Properties\":{\"Path\":\"$.rownumber\"}},"
        "{\"column\":\"rowguid\",  \"Properties\":{\"Path\":\"$.rowguid\"}}"
      "]"
  )

备注

如果上述映射是预先创建的,可使用 .ingest 控制命令引用它:When the mapping above is pre-created it can be referenced in the .ingest control command:

.ingest into Table123 (@"source1", @"source2")
    with 
    (
        format="avro", 
        ingestionMappingReference = "Mapping1"
    )

备注

不推荐使用以下没有 Properties 属性包的映射格式。The following mapping format, without the Properties property-bag, is deprecated.

.ingest into Table123 (@"source1", @"source2") 
  with 
  (
      format = "avro", 
      ingestionMapping = 
      "["
        "{\"column\":\"rownumber\",\"field\":\"rownumber\"},"
        "{\"column\":\"rowguid\",  \"field\":\"rowguid\"}"
      "]"
  )

Parquet 映射Parquet mapping

当源文件采用 Parquet 格式时,文件内容将映射到 Kusto 表。When the source file is in Parquet format, the file content is mapped to the Kusto table. 表必须存在于 Kusto 数据库中,除非为映射的所有列指定有效的数据类型。The table must exist in the Kusto database unless a valid datatype is specified for all the columns mapped. 在 Parquet 映射中,被映射的列必须存在于 Kusto 表中,除非为所有不存在的列指定了数据类型。The columns mapped in the Parquet mapping must exist in the Kusto table unless a datatype is specified for all the non-existing columns.

列表中的每个元素都描述特定列的映射,并且可能包含以下属性:Each element in the list describes a mapping for a specific column, and may contain the following properties:

属性Property 说明Description
path 如果开头为 $,指向将成为 Parquet 文档中列内容的字段的 JSON 路径(指示整个文档的 JSON 路径为 $)。If starts with $: JSON path to the field that will become the content of the column in the Parquet document (JSON path that denotes the entire document is $). 如果值不以 $ 开头,则使用常数值。If the value does not start with $: a constant value is used. 包含空格的 JSON 路径应作为 ['Property Name'] 进行转义。JSON paths that include white-spaces should be escaped as ['Property Name'].
transform (可选)应该应用于内容的映射转换(Optional) mapping transformations that should be applied on the content.

Parquet 映射示例Example of the Parquet mapping

[
  { "column" : "rownumber",   "Properties":{"Path":"$.rownumber"}}, 
  { "column" : "xdouble",     "Properties":{"Path":"$.xdouble"}}, 
  { "column" : "xbool",       "Properties":{"Path":"$.xbool"}}, 
  { "column" : "xint64",      "Properties":{"Path":"$.xint64"}}, 
  { "column" : "xdate",       "Properties":{"Path":"$.xdate"}}, 
  { "column" : "xtext",       "Properties":{"Path":"$.xtext"}}, 
  { "column" : "location",    "Properties":{"transform":"SourceLocation"}}, 
  { "column" : "lineNumber",  "Properties":{"transform":"SourceLineNumber"}}, 
  { "column" : "full_record", "Properties":{"Path":"$"}}
]

备注

当上述映射作为 .ingest 控制命令的一部分提供时,它将被序列化为 JSON 字符串。When the mapping above is provided as part of the .ingest control command it is serialized as a JSON string.

.ingest into Table123 (@"source1", @"source2") 
  with 
  (
      format = "parquet", 
      ingestionMapping = 
      "["
        "{\"column\":\"rownumber\",\"Properties\":{\"Path\":\"$.rownumber\"}},"
        "{\"column\":\"rowguid\",  \"Properties\":{\"Path\":\"$.rowguid\"}}",
        "{\"column\":\"custom_column\",  \"Properties\":{\"Path\":\"$.[\'property name with space\']\"}}"
      "]"
  )

备注

如果上述映射是预先创建的,可使用 .ingest 控制命令引用它:When the mapping above is pre-created it can be referenced in the .ingest control command:

.ingest into Table123 (@"source1", @"source2")
    with 
    (
        format="parquet", 
        ingestionMappingReference = "Mapping1"
    )

Orc 映射Orc mapping

当源文件采用 Orc 格式时,文件内容将映射到 Kusto 表。When the source file is in Orc format, the file content is mapped to the Kusto table. 表必须存在于 Kusto 数据库中,除非为映射的所有列指定有效的数据类型。The table must exist in the Kusto database unless a valid datatype is specified for all the columns mapped. 在 Orc 映射中,被映射的列必须存在于 Kusto 表中,除非为所有不存在的列指定了数据类型。The columns mapped in the Orc mapping must exist in the Kusto table unless a datatype is specified for all the non-existing columns.

列表中的每个元素都描述特定列的映射,并且可能包含以下属性:Each element in the list describes a mapping for a specific column, and may contain the following properties:

属性Property 说明Description
path 如果开头为 $,指向将成为 Orc 文档中列内容的字段的 JSON 路径(指示整个文档的 JSON 路径为 $)。If starts with $: JSON path to the field that will become the content of the column in the Orc document (JSON path that denotes the entire document is $). 如果值不以 $ 开头,则使用常数值。If the value does not start with $: a constant value is used. 包含空格的 JSON 路径应作为 ['Property Name'] 进行转义。JSON paths that include white-spaces should be escaped as ['Property Name'].
transform (可选)应该应用于内容的映射转换(Optional) mapping transformations that should be applied on the content.

Orc 映射示例Example of Orc mapping

[
  { "column" : "rownumber",   "Properties":{"Path":"$.rownumber"}}, 
  { "column" : "xdouble",     "Properties":{"Path":"$.xdouble"}}, 
  { "column" : "xbool",       "Properties":{"Path":"$.xbool"}}, 
  { "column" : "xint64",      "Properties":{"Path":"$.xint64"}}, 
  { "column" : "xdate",       "Properties":{"Path":"$.xdate"}}, 
  { "column" : "xtext",       "Properties":{"Path":"$.xtext"}}, 
  { "column" : "location",    "Properties":{"transform":"SourceLocation"}}, 
  { "column" : "lineNumber",  "Properties":{"transform":"SourceLineNumber"}}, 
  { "column" : "full_record", "Properties":{"Path":"$"}}
]

备注

当上述映射作为 .ingest 控制命令的一部分提供时,它将被序列化为 JSON 字符串。When the mapping above is provided as part of the .ingest control command it is serialized as a JSON string.

.ingest into Table123 (@"source1", @"source2") 
  with 
  (
      format = "orc", 
      ingestionMapping = 
      "["
        "{\"column\":\"rownumber\",\"Properties\":{\"Path\":\"$.rownumber\"}},"
        "{\"column\":\"rowguid\",  \"Properties\":{\"Path\":\"$.rowguid\"}}",
        "{\"column\":\"custom_column\",  \"Properties\":{\"Path\":\"$.[\'property name with space\']\"}}"
      "]"
  )

备注

如果上述映射是预先创建的,可使用 .ingest 控制命令引用它:When the mapping above is pre-created it can be referenced in the .ingest control command:

.ingest into Table123 (@"source1", @"source2")
    with 
    (
        format="orc", 
        ingestionMappingReference = "Mapping1"
    )

映射转换Mapping transformations

一些数据格式映射(Parquet、JSON 和 Avro)支持简单且有用的引入时间转换。Some of the data format mappings (Parquet, JSON and Avro) support simple and useful ingest-time transformations. 如果在引入时需要进行更复杂的处理,可使用更新策略,该策略允许使用 KQL 表达式定义轻型处理。Where the scenario requires more complex processing at ingest time, use Update policy, which allows defining lightweight processing using KQL expression.

路径依赖转换Path-dependant transformation 说明Description ConditionsConditions
PropertyBagArrayToDictionary 将属性的 JSON 数组(例如 {events:[{"n1":"v1"},{"n2":"v2"}]})转换为字典,并将其序列化为有效的 JSON 文档(例如 {"n1":"v1","n2":"v2"})。Transforms JSON array of properties (e.g. {events:[{"n1":"v1"},{"n2":"v2"}]}) to dictionary and serializes it to valid JSON document (for example, {"n1":"v1","n2":"v2"}). 仅当使用 path 时才能应用Can be applied only when path is used
SourceLocation 提供数据的存储项目的名称,类型 string(例如 blob 的“BaseUri”字段)。Name of the storage artifact that provided the data, type string (for example, the blob's "BaseUri" field).
SourceLineNumber 相对于该存储项目的偏移量,类型 long(从“1”开始,按每条新记录递增)。Offset relative to that storage artifact, type long (starting with '1' and incrementing per new record).
DateTimeFromUnixSeconds 将表示 unix 时间的数字(从 1970-01-01 开始的秒数)转换为 UTC 日期时间字符串Converts number representing unix-time (seconds since 1970-01-01) to UTC datetime string
DateTimeFromUnixMilliseconds 将表示 unix 时间的数字(从 1970-01-01 开始的毫秒数)转换为 UTC 日期时间字符串Converts number representing unix-time (milliseconds since 1970-01-01) to UTC datetime string
DateTimeFromUnixMicroseconds 将表示 unix 时间的数字(从 1970-01-01 开始的微秒数)转换为 UTC 日期时间字符串Converts number representing unix-time (microseconds since 1970-01-01) to UTC datetime string
DateTimeFromUnixNanoseconds 将表示 unix 时间的数字(从 1970-01-01 开始的纳秒数)转换为 UTC 日期时间字符串Converts number representing unix-time (nanoseconds since 1970-01-01) to UTC datetime string