引入规则Ingestion Rules

JSON 平展、转义和数组处理JSON Flattening, Escaping, and Array Handling

Azure 时序见解第 2 代环境将按照一组特定的命名约定动态创建暖存储和冷存储的列。Your Azure Time Series Insights Gen2 environment will dynamically create the columns of your warm and cold stores, following a particular set of naming conventions. 引入事件时,会将一组规则应用于 JSON 有效负载和属性名称。When an event is ingested, a set of rules is applied to the JSON payload and property names. 这包括对某些特殊字符进行转义以及平展嵌套的 JSON 对象。These include escaping certain special characters and flattening nested JSON objects. 你必须了解这些规则,以便理解 JSON 的形状如何影响事件的存储和查询方式。It's important to know these rules so that you understand how the shape of your JSON will influence how your events are stored and queried. 有关规则的完整列表,请参阅下表。See the table below for the full list of rules. 示例 A 和 B 还演示了如何在数组中高效地对多个时序进行批处理。Examples A & B also demonstrate how you're able to efficiently batch multiple time series in an array.

重要

  • 在选择时序 ID 属性和/或事件源时间戳属性之前,请查看以下规则。Review the rules below before selecting a Time Series ID property and/or your event source timestamp propert(ies). 如果你的 TS ID 或时间戳位于嵌套对象内或具有下述一个或多个特殊字符,请务必确保你提供的属性名称与应用引入规则后的列名称相匹配。If your TS ID or timestamp is within a nested object or has one or more of the special characters below, it's important to ensure that the property name that you provide matches the column name after the ingestion rules have been applied. 请参阅下面的示例 BSee example B below.
规则Rule 示例 JSONExample JSON 时序表达式语法Time Series Expression syntax Parquet 中的属性列名称Property column name in Parquet
Azure 时序见解第 2 代数据类型将以“<dataType>”形式追加到列名称的末尾The Azure Time Series Insights Gen2 data type is appended to the end of your column name as "<dataType>" "type": "Accumulated Heat" $event.type.String type_string
在 Azure 时序见解第 2 代中,事件源时间戳属性将作为“时间戳”保存在存储中,并且值以 UTC 格式存储。The event source timestamp property will be saved in Azure Time Series Insights Gen2 as “timestamp” in storage, and the value stored in UTC. 你可以自定义事件源时间戳属性来满足你的解决方案的需求,但暖存储和冷存储中的列名称为“时间戳”。You can customize your event source(s) timestamp property to meet the needs of your solution, but the column name in warm and cold storage is "timestamp". 不是事件源时间戳的其他日期/时间 JSON 属性在保存时列名称中将带有“_datetime”,如上面的规则中所述。Other datetime JSON properties that are not the event source timestamp will be saved with "_datetime" in the column name, as mentioned in the rule above. "ts": "2020-03-19 14:40:38.318" $event.$ts timestamp
如果 JSON 属性名称包含特殊字符 .、JSON property names that include the special characters . [ 、\、和 ',将使用 [' 和 '] 对属性名称进行转义[  \ and ' are escaped using [' and '] "id.wasp": "6A3090FD337DE6B" $event['id.wasp'].String ['id.wasp']_string
在 [' 和 '] 中,额外转义单引号和反斜杠。Within [' and '] there's additional escaping of single quotes and backslashes. 单引号将写为 \’,反斜杠将写为 A single quote will be written as \’ and a backslash will be written as \\ "Foo's Law Value": "17.139999389648" $event['Foo\'s Law Value'].Double ['Foo\'s Law Value']_double
嵌套的 JSON 对象将以句点作为分隔符进行平展。Nested JSON objects are flattened with a period as the separator. 最多支持嵌套 10 层。Nesting up to 10 levels is supported. "series": {"value" : 316 } $event.series.value.Long$event['series']['value'].Long$event.series['value'].Long$event.series.value.Long, $event['series']['value'].Long or $event.series['value'].Long series.value_long
基元类型的数组存储为 Dynamic 类型Arrays of primitive types are stored as the Dynamic type "values": [154, 149, 147] 动态类型只能通过 GetEvents API 进行检索Dynamic types can only be retreived via the GetEvents API values_dynamic
包含对象的数组有两种行为,具体取决于对象内容:如果数组中的对象内有 TS ID 或时间戳属性,则数组会展开,以便初始 JSON 有效负载产生多个事件。Arrays containing objects have two behaviors depending on the object content: If either the TS ID(s) or timestamp property(ies) are within the objects in an array, the array will be unrolled such that the initial JSON payload produces multiple events. 这使你能够将多个事件成批转换为一个 JSON 结构。This enables you to batch multiple events into one JSON structure. 与数组对等的所有顶级属性都会随每个展开的对象一起保存。Any top-level properties that are peers to the array will be saved with each unrolled object. 如果数组中没有 TS ID 和时间戳,则它会整体另存为 Dynamic 类型。If your TS ID(s) and timestamp are not within the array, it will be saved whole as the Dynamic type. 请参阅下面的示例 ABCSee examples A, B and C below
包含混合元素的数组不会平展。Arrays containing mixed elements aren't flattened. "values": ["foo", {"bar" : 149}, 147] 动态类型只能通过 GetEvents API 进行检索Dynamic types can only be retreived via the GetEvents API values_dynamic
512 个字符是 JSON 属性名称的长度上限。512 characters is the JSON property name limit. 如果名称超过 512 个字符,则会将其截断为 512 个字符,并追加“<'hashCode'>”。If the name exceeds 512 characters, it will be truncated to 512 and '<'hashCode'>' is appended. 注意,这也适用于从平展的对象连接的属性名称(表示嵌套的对象路径)。Note that this also applies to property names that have been concatenated from object flattened, denoting a nested object path. "data.items.datapoints.values.telemetry<...continuing to over 512 chars>" : 12.3440495 "$event.data.items.datapoints.values.telemetry<...continuing to include all chars>.Double" data.items.datapoints.values.telemetry<...continuing to 512 chars>_912ec803b2ce49e4a541068d495ab570_double

了解数组的双重行为Understanding the dual behavior for arrays

对象的数组将存储为一个整体或拆分为多个事件,具体取决于数据建模方式。Arrays of objects will either be stored whole or split into multiple events depending on how you've modeled your data. 这允许你使用数组来成批处理事件,避免在根对象级别定义重复的遥测属性。This allows you to use an array to batch events, and avoid repeating telemetry properties that are defined at the root object level. 批处理可能很有利,因为它会导致发送较少的事件中心或 IoT 中心消息。Batching may be advantageous as it results in fewer Event Hubs or IoT Hub messages sent.

但在某些情况下,包含对象的数组仅在其他值的上下文中有意义。However, in some cases, arrays containing objects are only meaningful in the context of other values. 创建多个事件会导致数据无意义。Creating multiple events would render the data meaningless. 若要确保对象的数组按原样存储为动态类型,请遵循下面的数据建模指导,并参阅示例 CTo ensure that an array of objects is stored as-is as a dynamic type, follow the data modeling guidance below and take a look at Example C

如何知道我的对象数组是否会产生多个事件?How do I know if my array of objects will produce multiple events?

如果你的一个或多个时序 ID 属性嵌套在数组中的对象内,或者事件源时间戳属性是嵌套的,则引入引擎会将其拆分,以创建多个事件。If one or more of your Time Series ID propert(ies) is nested within objects in an array, or if your event source timestamp property is nested, the ingestion engine will split it out to create multiple events. 你为 TS ID 和/或时间戳提供的属性名称应遵循上述平展规则,因此会指示你的 JSON 的形状。The property names that you provided for your TS ID(s) and/or timestamp should follow the flattening rules above, and will therefore indicate the shape of your JSON. 请参阅下面的示例,并查看有关如何选择时序 ID 属性的指南。See the examples below, and check out the guide on how to select a Time Series ID property.

示例 A:Example A:

对象根处的时序 ID 和嵌套的时间戳Time Series ID at the object root and timestamp nested
环境时序 ID: "id"Environment Time Series ID: "id"
事件源时间戳: "values.time"Event source timestamp: "values.time"
JSON 有效负载:JSON payload:

[
    {
        "id": "caaae533-1d6c-4f58-9b75-da102bcc2c8c",
        "values": [
            {
                "time": "2020-05-01T00:59:59.000Z",
                "value": 25.6073
            },
            {
                "time": "2020-05-01T01:00:29.000Z",
                "value": 43.9077
            }
        ]
    },
    {
        "id": "1ac87b74-0865-4a07-b512-56602a3a576f",
        "values": [
            {
                "time": "2020-05-01T00:59:59.000Z",
                "value": 0.337288
            },
            {
                "time": "2020-05-01T01:00:29.000Z",
                "value": 4.76562
            }
        ]
    }
]

Parquet 文件中的结果:Result in Parquet file:
上述配置和有效负载将产生三个列和四个事件The configuration and payload above will produce three columns and four events

timestamptimestamp id_stringid_string values.value_doublevalues.value_double
2020-05-01T00:59:59.000Z caaae533-1d6c-4f58-9b75-da102bcc2c8c 25.6073
2020-05-01T01:00:29.000Z caaae533-1d6c-4f58-9b75-da102bcc2c8c 43.9077
2020-05-01T00:59:59.000Z 1ac87b74-0865-4a07-b512-56602a3a576f 0.337288
2020-05-01T01:00:29.000Z 1ac87b74-0865-4a07-b512-56602a3a576f 4.76562

示例 B:Example B:

嵌套了一个属性的复合时序 IDComposite Time Series ID with one property nested

*环境时序 ID:* "plantId\""telemetry.tagId"

*Environment Time Series ID:* "plantId\" and "telemetry.tagId"

事件源时间戳: "timestamp"Event source timestamp: "timestamp"
JSON 有效负载:JSON payload:

[
    {
        "plantId": "9336971",
        "timestamp": "2020-01-22T16:38:09Z",
        "telemetry": [
            {
                "tagId": "100231-A-A6",
                "tagValue": -31.149018
            },
            {
                "tagId": "100231-A-A1",
                "tagValue": 20.560796
            },
            {
                "tagId": "100231-A-A9",
                "tagValue": 177
            },
            {
                "tagId": "100231-A-A8",
                "tagValue": 420
            },
        ]
    },
    {
        "plantId": "9336971",
        "timestamp": "2020-01-22T16:42:14Z",
        "telemetry": [
            {
                "tagId": "103585-A-A7",
                "value": -30.9918
            },
            {
                "tagId": "103585-A-A4",
                "value": 19.960796
            }
        ]
    }
]

Parquet 文件中的结果:Result in Parquet file:
上述配置和有效负载将产生四个列和六个事件The configuration and payload above will produce four columns and six events

timestamptimestamp plantId_stringplantId_string telemetry.tagId_stringtelemetry.tagId_string telemetry.value_doubletelemetry.value_double
2020-01-22T16:38:09Z 9336971 100231-A-A6 -31.149018-31.149018
2020-01-22T16:38:09Z 9336971 100231-A-A1 20.56079620.560796
2020-01-22T16:38:09Z 9336971 100231-A-A9 177177
2020-01-22T16:38:09Z 9336971 100231-A-A8 420420
2020-01-22T16:42:14Z 9336972 100231-A-A7 -30.9918-30.9918
2020-01-22T16:42:14Z 9336972 100231-A-A4 19.96079619.960796

示例 C:Example C:

对象根处的时序 ID 和时间戳Time Series ID and timestamp are at the object root

*环境时序 ID:* "id"

*Environment Time Series ID:* "id"

事件源时间戳: "timestamp"Event source timestamp: "timestamp"
JSON 有效负载:JSON payload:

{
    "id": "800500054755",
    "timestamp": "2020-11-01T10:00:00.000Z",
    "datapoints": [{
            "value": 120
        },
        {
            "value": 124
        }
    ]
}

Parquet 文件中的结果:Result in Parquet file:
上述配置和有效负载将产生三个列和一个事件The configuration and payload above will produce three columns and one event

timestamptimestamp id_stringid_string datapoints_dynamicdatapoints_dynamic
2020-11-01T10:00:00.000Z 800500054755 [{"value": 120},{"value":124}]

后续步骤Next steps