连续数据导出Continuous data export

连续将数据从 Kusto 导出到外部表Continuously export data from Kusto to an external table. 外部表定义了导出的数据的目标(例如 Azure Blob 存储)和架构。The external table defines the destination (for example, Azure Blob Storage) and the schema of the exported data. 导出的数据是由定期运行的查询定义的。The exported data is defined by a periodically run query. 结果将存储在外部表中。The results are stored in the external table. 此过程可确保所有记录都导出“恰好一次”(不包括维度表,在所有执行中都会对维度表中的所有记录进行评估)。The process guarantees that all records are exported "exactly once" (excluding dimension tables, in which all records are evaluated in all executions).

连续数据导出要求创建外部表,然后创建连续导出定义并将其指向外部表。Continuous data export requires you to create an external table and then create a continuous export definition pointing to the external table.

备注

  • Kusto 不支持导出在创建连续导出(此创建操作是连续导出过程的一部分)前引入的历史记录。Kusto doesn't support exporting historical records ingested before continuous export creation (as part of continuous export). 可以使用(非连续)导出命令单独导出历史记录。Historical records can be exported separately using the (non-continuous) export command. 有关详细信息,请参阅导出历史数据For more information, see exporting historical data.
  • 连续导出对于通过流式引入引入的数据不起作用。Continuous export doesn't work for data ingested using streaming ingestion.
  • 目前,无法在启用了行级别安全性策略的表上配置连续导出。Currently, continuous export can't be configured on a table on which a Row Level Security policy is enabled.
  • 连接字符串中包含 impersonate 的外部表不支持连续导出。Continuous export is not supported for external tables with impersonate in their connection strings.

注释Notes

  • 导出“恰好一次”的保证仅适用于显示导出的项目命令中报告的文件。The guarantee for "exactly once" export is only for files reported in the show exported artifacts command. 连续导出并不保证仅将每个记录向外部表中写入一次。Continuous export doesn't guarantee that each record will be written only once to the external table. 如果在开始导出后发生故障,并且某些项目已被写入到外部表,则外部表可能包含重复项(如果写入操作在完成前被中止,甚至会包含损坏的文件)。If a failure occurs after export has begun and some of the artifacts were already written to the external table, the external table may contain duplicates (or even corrupted files, in case a write operation was aborted before completion). 在这种情况下,不会从外部表中删除这些项目,但也不会在显示导出的项目命令中报告这些项目。In such cases, artifacts are not deleted from the external table but they will not be reported in the show exported artifacts command. 通过 show exported artifacts command 使用导出的文件。Consuming the exported files using the show exported artifacts command. 保证没有重复项(不是损坏项)。guarantees no duplicates (and not corruptions).

  • 为了保证导出“恰好一次”,连续导出使用数据库游标To guarantee "exactly once" export, continuous export uses database cursors. 必须在所有已在查询中引用且应当在导出中处理“恰好一次”的表上启用 IngestionTime 策略IngestionTime policy must be enabled on all tables referenced in the query that should be processed "exactly once" in the export. 此策略默认在所有新创建的表上启用。The policy is enabled by default on all newly created tables.

  • 导出查询的输出架构必须与要导出到的外部表的架构匹配。The output schema of the export query must match the schema of the external table to which you export.

  • 连续导出不支持跨数据库/群集调用。Continuous export doesn't support cross-database/cluster calls.

  • 连续导出将根据为其配置的时间段运行。Continuous export runs according to the time period configured for it. 此间隔的建议值至少为几分钟,具体取决于你愿意接受的延迟时间。The recommended value for this interval is at least several minutes, depending on the latencies you're willing to accept. 连续导出不是设计用于将数据从 Kusto 连续流式传输出去。Continuous export isn't designed for constantly streaming data out of Kusto. 它以分布式模式运行,所有节点并发导出。It runs in a distributed mode, where all nodes export concurrently. 如果每个运行所查询的数据范围很小,则连续导出的输出将是很多小项目(数量取决于群集中的节点数)。If the range of data queried by each run is small, the output of the continuous export would be many small artifacts (the number depends on the number of nodes in the cluster).

  • 可以并发运行的导出操作的数量受限于群集的数据导出容量(请参阅限制)。The number of export operations that can run concurrently is limited by the cluster's data export capacity (see throttling). 如果群集没有足够的容量来处理所有连续导出,某些导出将会滞后启动。If the cluster doesn't have sufficient capacity to handle all continuous exports, some will start lagging behind.

  • 默认情况下,会假定导出查询中引用的所有表都是事实数据表By default, all tables referenced in the export query are assumed to be fact tables. 因此,它们的作用域限定于数据库游标。Therefore, they are scoped to the database cursor. 导出查询仅包含自上一次执行导出后联接的记录。The export query includes only the records that joined since the previous export execution. 导出查询可能包含维度表,维度表的所有记录都包括在所有导出查询中。 The export query may contain dimension tables in which all records of the dimension table are included in all export queries.

    • 在连续导出中的事实数据表与维度表之间使用联接时,必须注意,事实数据表中的记录仅处理一次 - 如果在运行导出时维度表中缺少某些键的记录,则导出的文件会缺少相应键的记录或者会在维度列中包含 null 值(具体取决于查询使用内部联接还是外部联接)。When using joins between fact and dimension tables in continuous-export, you must keep in mind that records in the fact table are only processed once - if the export runs while records in the dimension tables are missing for some keys, records for the respective keys will either be missed or include null values for the dimension columns in the exported files (depending on whether the query uses inner or outer join). 对于此类情况,连续导出定义中的 forcedLatency 属性可能非常有用,因为对于匹配的记录,事实数据表和维度表是在同一时间内引入的。The forcedLatency property in the continuous-export definition can be useful for such cases, where the fact and dimensions tables are ingested during the same time (for matching records).
    • 不支持纯维度表的连续导出。Continuous-export of only dimension tables isn't supported. 导出查询必须包含至少一个事实数据表。The export query must include at least a single fact table.
    • 此语法显式声明哪些表具有作用域(事实数据表)以及哪些表没有作用域(维度表)。The syntax explicitly declares which tables are scoped (fact) and which are not scoped (dimension). 有关详细信息,请参阅创建命令中的 over 参数。See the over parameter in the create command for details.
  • 每个连续导出迭代中导出的文件数取决于外部表的分区情况。The number of files exported in each continuous export iteration depends on how the external table is partitioned. 有关详细信息,请参阅导出到外部表命令中的“说明”部分。For more information, see the notes section in export to external table command. 每个连续导出迭代始终会写入到新文件中,永远不会附加到现有文件中。Each continuous export iteration always writes to new files, and never appends to existing ones. 因此,导出的文件的数目也取决于连续导出的运行频率(intervalBetweenRuns 参数)。As a result, the number of exported files also depends on the frequency in which the continuous export runs (intervalBetweenRuns parameter).

所有连续导出命令都需要数据库管理员权限All of the continuous export commands require database admin permissions.

创建或更改连续导出Create or alter continuous export

语法:Syntax:

.create-or-alter continuous-export ContinuousExportName.create-or-alter continuous-export ContinuousExportName
[ over (T1, T2 )][ over (T1, T2 )]
to table ExternalTableNameto table ExternalTableName
[ with (PropertyName = PropertyValue,...)][ with (PropertyName = PropertyValue,...)]
<| Query<| Query

属性Properties:

属性Property 类型Type 说明Description
ContinuousExportNameContinuousExportName StringString 连续导出的名称。Name of continuous export. 该名称在数据库中必须独一无二,用于定期运行连续导出。Name must be unique within the database and is used to periodically run the continuous export.
ExternalTableNameExternalTableName StringString 要导出到的外部表的名称。Name of external table to export to.
查询Query StringString 要导出的查询。Query to export.
over (T1, T2)over (T1, T2) StringString 查询中可选的以逗号分隔的事实数据表列表。An optional comma-separated list of fact tables in the query. 如果不指定此项,将假定查询中引用的所有表都是事实数据表。If not specified, all tables referenced in the query are assumed to be fact tables. 如果指定此项,则不在此列表中的表将被视为维度表,并且将没有作用域(所有记录都将参与所有导出)。If specified, tables not in this list are treated as dimension tables and will not be scoped (all records will participate in all exports). 有关详细信息,请参阅“说明”部分See the notes section for details.
intervalBetweenRunsintervalBetweenRuns TimespanTimespan 连续导出执行之间的时间跨度。The time span between continuous export executions. 必须大于 1 分钟。Must be greater than 1 minute.
forcedLatencyforcedLatency TimespanTimespan 一个可选的时间段,将查询范围限定为在此时间段之前(相对于当前时间)引入的记录。An optional period of time to limit the query to records that were ingested only prior to this period (relative to current time). 例如,如果查询执行一些聚合/联接操作,而你想要确保在运行导出之前已引入所有相关记录,则此属性很有用。This property is useful if, for example, the query performs some aggregations/joins and you would like to make sure all relevant records have already been ingested before running the export.

除此之外,连续导出的创建命令支持导出到外部表命令支持的所有属性。In addition to the above, all properties supported in export to external table command are supported in the continuous export create command.

示例:Example:

.create-or-alter continuous-export MyExport
over (T)
to table ExternalBlob
with
(intervalBetweenRuns=1h, 
 forcedLatency=10m, 
 sizeLimit=104857600)
<| T
名称Name ExternalTableNameExternalTableName 查询Query ForcedLatencyForcedLatency IntervalBetweenRunsIntervalBetweenRuns CursorScopedTablesCursorScopedTables ExportPropertiesExportProperties
MyExportMyExport ExternalBlobExternalBlob SS 00:10:0000:10:00 01:00:0001:00:00 [[
"['DB'].['S']""['DB'].['S']"
]]
{{
"SizeLimit":104857600"SizeLimit": 104857600
}}

显示连续导出Show continuous export

语法:Syntax:

.show continuous-export ContinuousExportName.show continuous-export ContinuousExportName

返回 ContinuousExportName 的连续导出属性。Returns the continuous export properties of ContinuousExportName.

属性:Properties:

属性Property 类型Type 说明Description
ContinuousExportNameContinuousExportName StringString 连续导出的名称。Name of continuous export.

.show continuous-exports.show continuous-exports

返回数据库中的所有连续导出。Returns all continuous exports in the database.

输出:Output:

输出参数Output parameter 类型Type 说明Description
CursorScopedTablesCursorScopedTables StringString 显式限定了作用域的(事实数据)表的列表(序列化的 JSON)List of explicitly scoped (fact) tables (JSON serialized)
ExportPropertiesExportProperties StringString 导出属性(序列化的 JSON)Export properties (JSON serialized)
ExportedToExportedTo DateTimeDateTime 上次成功导出的日期时间(引入时间)The last datetime (ingestion time) that was exported successfully
ExternalTableNameExternalTableName StringString 外部表的名称Name of the external table
ForcedLatencyForcedLatency TimeSpanTimeSpan 强制延迟(如果未提供,则为 null)Forced latency (null if not provided)
IntervalBetweenRunsIntervalBetweenRuns TimeSpanTimeSpan 两次运行之间的间隔Interval between runs
IsDisabledIsDisabled 布尔Boolean 如果连续导出已禁用,则为 trueTrue if the continuous export is disabled
IsRunningIsRunning 布尔Boolean 如果连续导出当前正在运行,则为 trueTrue if the continuous export is currently running
LastRunResultLastRunResult StringString 上次连续导出运行的结果(CompletedFailedThe results of the last continuous-export run (Completed or Failed)
LastRunTimeLastRunTime DateTimeDateTime 上次执行连续导出的时间(开始时间)The last time the continuous export was executed (start time)
名称Name StringString 连续导出的名称Name of the continuous export
查询Query StringString 导出查询Export query
StartCursorStartCursor StringString 此连续导出的首次执行的起点Starting point of the first execution of this continuous export

显示连续导出项目Show continuous export artifacts

语法:Syntax:

.show continuous-export ContinuousExportName exported-artifacts.show continuous-export ContinuousExportName exported-artifacts

返回连续导出在所有运行中导出的所有项目。Returns all artifacts exported by the continuous-export in all runs. 可以在命令中按 Timestamp 列筛选结果,以仅查看感兴趣的记录。Filter the results by the Timestamp column in the command to view only records of interest. 导出的项目的历史记录将保留 14 天。The history of exported artifacts is retained for 14 days.

属性:Properties:

属性Property 类型Type 说明Description
ContinuousExportNameContinuousExportName StringString 连续导出的名称。Name of continuous export.

输出:Output:

输出参数Output parameter 类型Type 说明Description
TimestampTimestamp datetimeDatetime 连续导出运行的时间戳Timestamp of the continuous export run
ExternalTableNameExternalTableName StringString 外部表的名称Name of the external table
PathPath StringString 输出路径Output path
NumRecordsNumRecords longlong 导出到路径的记录数Number of records exported to path

示例:Example:

.show continuous-export MyExport exported-artifacts | where Timestamp > ago(1h)
TimestampTimestamp ExternalTableNameExternalTableName PathPath NumRecordsNumRecords SizeInBytesSizeInBytes
2018-12-20 07:31:30.26342162018-12-20 07:31:30.2634216 ExternalBlobExternalBlob http://storageaccount.blob.core.chinacloudapi.cn/container1/1_6ca073fd4c8740ec9a2f574eaa98f579.csv 10 个10 10241024

显示连续导出失败Show continuous export failures

语法:Syntax:

.show continuous-export ContinuousExportName failures.show continuous-export ContinuousExportName failures

返回已记录为连续导出的一部分的所有失败。Returns all failures logged as part of the continuous export. 可以在命令中按 Timestamp 列筛选结果,以仅查看感兴趣的时间范围。Filter the results by the Timestamp column in the command to view only time range of interest.

属性:Properties:

属性Property 类型Type 说明Description
ContinuousExportNameContinuousExportName StringString 连续导出的名称Name of continuous export

输出:Output:

输出参数Output parameter 类型Type 说明Description
TimestampTimestamp datetimeDatetime 失败时的时间戳。Timestamp of the failure.
OperationIdOperationId StringString 失败的操作 ID。Operation ID of the failure.
名称Name StringString 连续导出名称。Continuous export name.
LastSuccessRunLastSuccessRun TimestampTimestamp 连续导出的上次成功运行。The last successful run of the continuous export.
FailureKindFailureKind StringString Failure/PartialFailure。Failure/PartialFailure. PartialFailure 表示在失败之前已成功导出某些项目。PartialFailure indicates some artifacts were exported successfully before the failure occurred.
详细信息Details StringString 失败错误详细信息。Failure error details.

示例:Example:

.show continuous-export MyExport failures 
TimestampTimestamp OperationIdOperationId 名称Name LastSuccessRunLastSuccessRun FailureKindFailureKind 详细信息Details
2019-01-01 11:07:41.18873042019-01-01 11:07:41.1887304 ec641435-2505-4532-ba19-d6ab88c96a9dec641435-2505-4532-ba19-d6ab88c96a9d MyExportMyExport 2019-01-01 11:06:35.63081402019-01-01 11:06:35.6308140 失败Failure 详细信息...Details...

删除连续导出Drop continuous export

语法:Syntax:

.drop continuous-export ContinuousExportName.drop continuous-export ContinuousExportName

属性:Properties:

属性Property 类型Type 说明Description
ContinuousExportNameContinuousExportName StringString 连续导出的名称Name of continuous export

输出:Output:

数据库中的其余连续导出(删除后)。The remaining continuous exports in the database (post deletion). 输出架构,与显示连续导出命令中显示的相同。Output schema as in the show continuous export command.

禁用或启用连续导出Disable or enable continuous export

语法:Syntax:

.enable continuous-export ContinuousExportName.enable continuous-export ContinuousExportName

.disable continuous-export ContinuousExportName.disable continuous-export ContinuousExportName

你可以禁用或启用连续导出作业。You can disable or enable the continuous-export job. 不会执行禁用的连续导出,但其当前状态将保留,并且可以在启用连续导出后恢复。A disabled continuous export won't be executed, but its current state is persisted and can be resumed when the continuous export is enabled. 当启用已长时间禁用的连续导出时,导出将从上次禁用导出时停止的位置继续进行。When enabling a continuous export that has been disabled for a long time, exporting will continue from where it last stopped when the exporting disabled. 如果没有足够的群集容量来为所有进程提供服务,此继续操作可能会导致导出长时间运行,阻止其他导出运行。This continuation may result in a long running export, blocking other exports from running, if there isn't sufficient cluster capacity to serve all processes. 连续导出将按上次运行时间升序执行(最早的导出将首先运行,直到赶上进度)。Continuous exports are executed by last run time in ascending order (oldest export will run first, until catch up is complete).

属性:Properties:

属性Property 类型Type 说明Description
ContinuousExportNameContinuousExportName StringString 连续导出的名称Name of continuous export

输出:Output:

对更改后的连续导出执行显示连续导出命令的结果。The result of the show continuous export command of the altered continuous export.

导出历史数据Exporting historical data

连续导出只从其创建时间点开始导出数据。Continuous export starts exporting data only from the point of its creation. 应该使用(非连续)导出命令单独导出在该时间之前引入的记录。Records ingested prior to that time should be exported separately using the (non-continuous) export command. 为了避免与连续导出所导出的数据重复,请使用显示连续导出命令返回的 StartCursor,并只导出 cursor_before_or_at 该游标值的记录。To avoid duplicates with data exported by continuous export, use the StartCursor returned by the show continuous export command and export only records where cursor_before_or_at the cursor value. 请参阅以下示例。See the example below. 历史数据可能太大,无法在单个导出命令中导出。Historical data may be too large to be exported in a single export command. 因此,请将查询分成几个较小的批。Therefore, partition the query into several smaller batches.

.show continuous-export MyExport | project StartCursor
StartCursorStartCursor
636751928823156645636751928823156645

后面是:Followed by:

.export async to table ExternalBlob
<| T | where cursor_before_or_at("636751928823156645")