更新策略概述Update policy overview

更新策略指示 Kusto 根据对插入到源表中的数据运行的转换查询,自动在有新数据插入到源表中时将该数据追加到目标表中。The update policy instructs Kusto to automatically append data to a target table whenever new data is inserted into the source table, based on a transformation query that runs on the data inserted into the source table.

Azure 数据资源管理器中的更新策略概述

例如,可以通过策略将一个表创建为另一个表的经筛选视图。For example, the policy lets the creation of one table be the filtered view of another table. 新表可以有不同的架构、保留策略,等等。The new table can have a different schema, retention policy, and so on.

更新策略遵循与常规引入相同的限制和最佳做法。The update policy is subject to the same restrictions and best practices as regular ingestion. 策略可以随群集大小进行横向扩展,在大批量引入时效率更高。The policy scales-out with the size of the cluster, and works more efficiently if ingestions are done in large bulks.

备注

源表和为其定义了更新策略的表必须位于同一数据库。The source table and the table for which the update policy is defined must be in the same database. 更新策略函数架构和目标表架构必须具有匹配的列名、类型和顺序。The update policy function schema and the target table schema must match in their column names, types, and order.

更新策略的查询Update policy's query

更新策略查询以特殊模式运行。在这种模式下,它的作用域自动限定为仅涵盖最新引入的记录,你不能在此查询中查询源表的已引入数据。The update policy query is run in a special mode, in which it's automatically scoped to cover only the newly ingested records, and you can't query the source table's already-ingested data as part of this query. 但是,事务性更新策略的“边界”中引入的数据可用于单个事务中的查询。However, data ingested in the "boundary" of transactional update policies does become available for a query in a single transaction. 由于更新策略是对目标表定义的,因此将数据引入一个源表可能会导致对该数据运行多个查询。Because the update policy is defined on the destination table, ingesting data into one source table may result in multiple queries being run on that data. 未定义多个更新策略的执行顺序。The order of execution of multiple update policies is undefined.

查询限制Query limitations

  • 查询可以调用存储函数,但不能包含跨数据库或跨群集查询。The query can invoke stored functions, but can't include cross-database or cross-cluster queries.
  • 作为更新策略的一部分运行的查询无法读取启用了 RestrictedViewAccess 策略行级安全策略的表。A query that is run as part of an update policy doesn't have read access to tables that have the RestrictedViewAccess policy enabled or with a Row Level Security policy enabled.
  • 在策略的 Query 部分中或由 Query 部分引用的函数中引用 Source 表时:When referencing the Source table in the Query part of the policy, or in functions referenced by the Query part:
    • 不要使用表的限定名称。Don't use the qualified name of the table. 请改用 TableNameInstead, use TableName.
    • 不要使用 database("DatabaseName").TableNamecluster("ClusterName").database("DatabaseName").TableNameDon't use database("DatabaseName").TableName or cluster("ClusterName").database("DatabaseName").TableName.

警告

在更新策略中定义不正确的查询会妨碍数据引入源表中。Defining an incorrect query in the update policy can prevent any data from being ingested into the source table.

更新策略对象The update policy object

一个表可能具有零个、一个或多个与之相关联的更新策略对象。A table may have zero, one, or more update policy objects associated with it. 每一个此类对象都表示为 JSON 属性包,并定义了以下属性。Each such object is represented as a JSON property bag, with the following properties defined.

propertiesProperty 类型Type 说明Description
IsEnabledIsEnabled bool 说明是启用 (true) 还是禁用 (false) 更新策略States if update policy is enabled (true) or disabled (false)
SourceSource string 触发更新策略调用的表的名称Name of the table that triggers update policy to be invoked
查询Query string 用于生成更新数据的 Kusto CSL 查询A Kusto CSL query that is used to produce the data for the update
IsTransactionalIsTransactional bool 说明更新策略是否是事务性的(默认为 false)。States if the update policy is transactional or not (defaults to false). 无法运行事务性更新策略会导致无法使用新数据对源表进行更新Failure to run a transactional update policy results in the source table not being updated with new data
PropagateIngestionPropertiesPropagateIngestionProperties bool 说明数据引入源表时指定的引入属性(区标签和创建时间)是否也应该应用于派生表中的属性。States if ingestion properties (extent tags and creation time) specified during the ingestion into the source table, should also apply to the ones in the derived table.

备注

允许级联更新 (TableATableBTableC → ...)。Cascading updates are allowed (TableATableBTableC → ...).

但是,如果以循环方式定义了多个表的更新策略,则会切断更新链。However, if update policies are defined over multiple tables in a circular manner, the chain of updates is cut. 此问题是在运行时检测到的。This issue is detected at runtime. 数据将仅向受影响表的链中的每个表引入一次。Data will be ingested only once to each table in the chain of affected tables.

更新策略命令Update policy commands

用于控制更新策略的命令包括:Commands to control the update policy include:

更新策略在引入后启动Update policy is initiated following ingestion

使用以下任意命令将数据引入或移动到定义的源表(创建区的位置)中时,更新策略会生效:Update policies take effect when data is ingested or moved to (extents are created in) a defined source table using any of the following commands:

使用更新策略进行常规引入Regular ingestion using update policy

当满足以下条件时,更新策略的行为类似于常规引入:The update policy will behave like regular ingestion when the following conditions are met:

  • 源表是高速率跟踪表,其中值得关注的数据的格式为自由文本列。The source table is a high-rate trace table with interesting data formatted as a free-text column.
  • 在其上定义了更新策略的目标表仅接受特定的跟踪行。The target table on which the update policy is defined accepts only specific trace lines.
  • 该表具有结构良好的架构,该架构是对分析运算符创建的原始自由文本数据进行转换的结果。The table has a well-structured schema that is a transformation of the original free-text data created by the parse operator.

在源表上进行零保留Zero retention on source table

有时,引入到源表中的数据的最终目标是目标表,源表在此过程中只是一个过渡,并且你不想将原始数据保留在源表中。Sometimes data is ingested to a source table only as a stepping stone to the target table, and you don't want to keep the raw data in the source table. 请在源表的保留策略中将软删除期间定义为 0,并将更新策略设置为“事务性”。Set a soft-delete period of 0 in the source table's retention policy, and set the update policy as transactional. 在这种情况下:In this situation:

  • 无法从源表查询源数据。The source data isn't queryable from the source table.
  • 引入操作期间,源数据不会永久保存到持久性存储中。The source data isn't persisted to durable storage as part of the ingestion operation.
  • 操作性能会提高。Operational performance will improve.
  • 用于后台整理操作的引入后资源会减少。Post-ingestion resources for background grooming operations will be reduced. 这些操作是在源表中的上完成的。These operations are done on extents in the source table.

性能影响Performance impact

更新策略会影响 Kusto 群集的性能。Update policies can affect the performance of a Kusto cluster. 更新策略会影响源表的任何引入。The update policy affects any ingestion into the source table. 引入多个数据区时,会乘以目标表的数量。Ingestion of a number of data extents is multiplied by the number of target tables. 因此,对更新策略的 Query 部分进行优化以使其正常工作很重要。As such, it's important that the Query part of the update policy is optimized to work well. 你可以测试更新策略对引入操作的额外性能影响。You can test an update policy's additional performance impact on an ingestion operation. 在创建或者更改策略或它在 Query 部分中使用的函数之前,请针对特定的现有区调用该策略。Invoke the policy on specific and already-existing extents, before creating or altering the policy or function it uses in its Query part.

评估资源使用情况Evaluate resource usage

在以下情况下,使用 .show queries 来评估资源使用情况(CPU、内存等的使用情况):Use .show queries, to evaluate resource usage (CPU, memory, and so on) in the following scenario:

  • 源表名称(更新策略的 Source 属性)为 MySourceTableThe source table name (the Source property of the update policy) is MySourceTable.
  • 更新策略的 Query 属性调用名为 MyFunction() 的函数。The Query property of the update policy calls a function named MyFunction().
.show table MySourceTable extents;
// The following line provides the extent ID for the not-yet-merged extent in the source table which has the most records
let extentId = $command_results | where MaxCreatedOn > ago(1hr) and MinCreatedOn == MaxCreatedOn | top 1 by RowCount desc | project ExtentId;
let MySourceTable = MySourceTable | where extent_id() == toscalar(extentId);
MyFunction()

失败数Failures

默认情况下,无法运行更新策略不会影响将数据引入源表。By default, failure to run the update policy doesn't affect the ingestion of data to the source table. 但是,如果将更新策略定义为 IsTransactional:true,则无法运行更新策略会导致无法将数据引入到源表中。However, if the update policy is defined as IsTransactional:true, failure to run the policy forces the ingestion of data into the source table to fail. 在某些情况下,数据成功引入到源表中,但更新策略在数据引入到目标表时失败。In some cases, ingestion of data into the source table succeeds, but the update policy fails during ingestion to the target table.

可以使用 .show ingestion failures 命令检索更新策略时发生的故障。Failures that occur while the policies are being updated can be retrieved using the .show ingestion failures command.

.show ingestion failures 
| where FailedOn > ago(1hr) and OriginatesFromUpdatePolicy == true

故障处理Treatment of failures

非事务性策略Non-transactional policy

Kusto 将忽略此类失败。The failure is ignored by Kusto. 任何重试均应由数据引入流程所有者负责。Any retry is the responsibility of the data ingestion process owner.

事务性策略Transactional policy

触发了更新的初始引入操作也将失败。The original ingestion operation that triggered the update will also fail. 不会使用新数据修改源表和数据库。The source table and the database won't be modified with new data. 如果引入方法为 pull(引入过程涉及 Kusto 的数据管理服务),则 Kusto 的数据管理服务会根据以下逻辑自动重试整个引入操作:If the ingestion method is pull (Kusto's Data Management service is involved in the ingestion process), there's an automated retry on the entire ingestion operation, orchestrated by Kusto's Data Management service, according to the following logic:

  • 重试会一直进行,直到达到 DataImporterMaximumRetryPeriod(默认 = 2 天)或 DataImporterMaximumRetryAttempts(默认 = 10 次),以先达到的为准。Retries are done until the earliest between DataImporterMaximumRetryPeriod (default = 2 days) and DataImporterMaximumRetryAttempts (default = 10) is reached.
  • 可以在数据管理服务的配置中更改以上两个设置。Both of the above settings can be altered in the Data Management service's configuration.
  • 退避周期从 2 分钟开始,以指数形式增长(2 -> 4 -> 8 -> 16 ... 分钟)The backoff period starts at 2 minutes, and grows exponentially (2 -> 4 -> 8 -> 16 ... minutes)

在其他任何情况下,任何重试均应由数据所有者负责。In any other case, any retry is the responsibility of the data owner.