数据库游标Database cursors

数据库游标是一个数据库级对象,可用于多次查询数据库。A database cursor is a database-level object that lets you query a database multiple times. 即使存在与查询并行发生的 data-appenddata-retention 操作,也会获得一致的结果。You'll get consistent results even if there are data-append or data-retention operations happening in parallel with the queries.

数据库游标旨在解决下述两个重要方案的问题:Database cursors are designed to address two important scenarios:

  • 只要查询指示“相同数据集”,就可以多次重复相同的查询并获得相同的结果。The ability to repeat the same query multiple times and get the same results, as long as the query indicates "same data set".

  • 能够进行“恰好一次”查询。The ability to make an "exactly once" query. 此查询仅“看到”先前查询未看到的数据(因为那时该数据不可用)。This query only "sees" the data that a previous query didn't see, because the data wasn't available then. 例如,可以使用该查询来循环访问表中所有新到达的数据,而不必担心处理同一记录两次或错误地跳过记录。The query lets you iterate, for example, through all the newly arrived data in a table without fear of processing the same record twice or skipping records by mistake.

数据库游标在查询语言中表示为 string 类型的标量值。The database cursor is represented in the query language as a scalar value of type string. 应将实际值视为不透明值。除了保存其值或使用下面所述的游标函数外,不支持任何其他操作。The actual value should be considered opaque and there's no support for any operation other than to save its value or use the cursor functions noted below.

游标函数Cursor functions

Kusto 提供三个函数来帮助实现上述两个方案:Kusto provides three functions to help implement the two above scenarios:

  • cursor_current():使用此函数可检索数据库游标的当前值。cursor_current(): Use this function to retrieve the current value of the database cursor. 可以使用此值作为其他两个函数的参数。You can use this value as an argument to the two other functions. 此函数还有一个同义词,即 current_cursor()This function also has a synonym, current_cursor().

  • cursor_after(rhs:string):此特殊函数可用于启用了 IngestionTime 策略的表记录。cursor_after(rhs:string): This special function can be used on table records that have the IngestionTime policy enabled. 它将返回 bool 类型的标量值,指示记录的 ingestion_time() 数据库游标值是否在 rhs 数据库游标值之后。It returns a scalar value of type bool indicating whether the record's ingestion_time() database cursor value comes after the rhs database cursor value.

  • cursor_before_or_at(rhs:string):此特殊函数可用于启用了 IngestionTime 策略的表记录。cursor_before_or_at(rhs:string): This special function can be used on the table records that have the IngestionTime policy enabled. 它返回一个 bool 类型的标量值,指示记录的 ingestion_time() 数据库游标值是在 rhs 数据库游标值之前还是在该值处。It returns a scalar value of type bool indicating whether the record's ingestion_time() database cursor value comes before or at the rhs database cursor value.

这两个特殊函数(cursor_aftercursor_before_or_at)也有副作用:使用这两个函数时,Kusto 会将 数据库游标的当前值 发送到查询的 @ExtendedProperties 结果集。The two special functions (cursor_after and cursor_before_or_at) also have a side-effect: When they're used, Kusto will emit the current value of the database cursor to the @ExtendedProperties result set of the query. 游标的属性名称为 Cursor,其值为一个 stringThe property name for the cursor is Cursor, and its value is a single string.

例如:For example:

{"Cursor" : "636040929866477946"}

限制Restrictions

数据库游标只能用于已启用 IngestionTime 策略的表。Database cursors can only be used with tables for which the IngestionTime policy has been enabled. 此类表中的每条记录都与引入记录时有效的数据库游标的值相关联。Each record in such a table is associated with the value of the database cursor that was in effect when the record was ingested. 因此,可以使用 ingestion_time() 函数。As such, the ingestion_time() function can be used.

除非数据库包含至少一个定义了 IngestionTime 策略的表,否则数据库游标对象不保存任何有意义的值。The database cursor object holds no meaningful value unless the database has at least one table that has an IngestionTime policy defined. 系统会确保根据引入历史记录的需要将该值更新到此类表中,然后运行引用此类表的查询。This value is guaranteed to update, as-needed by the ingestion history, into such tables and the queries run, that reference such tables. 在其他情况下,它不一定会更新。It might, or might not, be updated in other cases.

引入过程首先会提交数据,使其可用于查询,然后才会为每条记录分配一个实际的游标值。The ingestion process first commits the data, so that it's available for querying, and only then assigns an actual cursor value to each record. 如果尝试在引入完成后立即使用数据库游标查询数据,则结果可能尚未包含最后添加的记录,因为尚未为这些记录分配游标值。If you attempt to query for data immediately following the ingestion completion using a database cursor, the results might not yet incorporate the last records added, because they haven't yet been assigned the cursor value. 同样,即使引入在两次检索之间进行,重复检索当前数据库游标值也可能返回相同的值,因为只有游标提交才能更新其值。Also, retrieving the current database cursor value repeatedly might return the same value, even if ingestion was done in between, because only a cursor commit can update its value.

基于数据库游标对表进行查询只有在记录被直接引入到该表的情况下才能保证有效(仅提供一次保证)。Querying a table based on database cursors is only guaranteed to "work" (providing exactly-once guarantees) if the records are ingested directly into that table. 如果使用 move extents/.replace extents 等盘区命令将数据移到表中,或使用 .rename table,那么使用数据库游标查询该表无法保证不会遗漏任何数据。If you are using extents commands, such as move extents/.replace extents to move data into the table, or if you are using .rename table, then querying this table using database cursors is not guaranteed to not miss any data. 这是因为记录的引入时间是最初引入时分配的,并且在移动盘区操作期间无法更改。This is because the ingestion time of the records is assigned when initially ingested, and does not change during the move extents operation. 因此,将盘区移动到目标表中时,可能已经处理分配到这些盘区中的记录的游标值(并且数据库游标的下一个查询将会遗漏新记录)。Therefore, when the extents are moved into the target table, it's possible that the cursor value assigned to the records in these extents was already processed (and next query by database cursor will miss the new records).

示例:仅处理一次记录Example: Processing records exactly once

对于架构为 [Name, Salary] 的表 Employees,若要在新记录引入到表中时连续处理这些记录,请使用以下过程:For a table Employees with schema [Name, Salary], to continuously process new records as they're ingested into the table, use the following process:

// [Once] Enable the IngestionTime policy on table Employees
.set table Employees policy ingestiontime true

// [Once] Get all the data that the Employees table currently holds 
Employees | where cursor_after('')

// The query above will return the database cursor value in
// the @ExtendedProperties result set. Lets assume that it returns
// the value '636040929866477946'

// [Many] Get all the data that was added to the Employees table
// since the previous query was run using the previously-returned
// database cursor 
Employees | where cursor_after('636040929866477946') // -> 636040929866477950

Employees | where cursor_after('636040929866477950') // -> 636040929866479999

Employees | where cursor_after('636040929866479999') // -> 636040939866479000