使用保留策略管理历史数据Manage historical data with retention policy

可以分别在数据库和任何基础表上启用数据保留,从而支持用户创建用于表和数据库的灵活的老化策略。Data Retention can enabled on the database and any of the underlying tables individually, allowing users to create flexible aging policies for their tables and databases. 应用数据保留很简单:仅需要在创建表或更改表操作期间设置一个参数。Applying data retention is simple: it requires only one parameter to be set during table creation or as part of an alter table operation.

定义数据库和基础表的数据保留策略后,将运行后台时间计时器任务,以从启用了数据保留的表中删除任何已过时的记录。After data retention policy is defiend for a database and the underlying table, a background time timer task runs to remove any obsolete records from the table enabled for data retention. 识别匹配行并将其从表中删除,这些操作以透明方式在系统计划和运行的后台任务中进行。Identification of matching rows and their removal from the table occur transparently, in the background task that is scheduled and run by the system. 根据表定义中用作 filter_column 的列检查表行的期限条件。Age condition for the table rows is checked based on the column used as the filter_column in the table definition. 例如,如果保留期设置为一周,则应清理的表行应符合以下条件之一:If retention period, for example, is set to one week, table rows eligible for cleanup satisfy either of the following condition:

  • 如果筛选器列使用 DATETIMEOFFSET 数据类型,那么条件为 filter_column < DATEADD(WEEK, -1, SYSUTCDATETIME())If the filter column uses DATETIMEOFFSET data type then the condition is filter_column < DATEADD(WEEK, -1, SYSUTCDATETIME())
  • 否则条件为 filter_column < DATEADD(WEEK, -1, SYSDATETIME())Else then the condition is filter_column < DATEADD(WEEK, -1, SYSDATETIME())

数据保留清理阶段Data retention cleanup phases

数据保留清理操作包括两个阶段。Data retention cleanup operation comprises of two phases.

  • 发现阶段 - 在此阶段中,清理操作将标识用户数据库中的所有表,以生成清理列表。Discovery Phase - In this phase the cleanup operation identifies all the tables within the user databases to build a list for cleanup. 发现一天运行一次。Discovery runs once a day.
  • 清理阶段 - 在此阶段中,对发现阶段标识的具有有限数据保留期的所有表运行清理。Cleanup Phase - In this phase, cleanup is run against all tables with finite data retention, identified in the discovery phase. 如果无法对表执行清理操作,则会在当前运行中跳过该表,并将在下一次迭代中重试。If the cleanup operation cannot be performed on a table, then that table is skipped in the current run and will be retried in the next iteration. 清理过程中使用以下原则The following principles are used during cleanup
    • 如果已过时的行被另一个事务锁定,则会跳过该行。If an obsolete row is locked by another transaction, that row is skipped.
    • 清理运行,默认锁定超时设置为 5 秒。Clean up runs with a default 5 seconds lock timeout setting. 如果无法在超时时限内获取表上的锁定,则会在当前运行中跳过该表,并将在下一次迭代中重试。If the locks cannot be acquired on the tables within the timeout window, the table is skipped in the current run and will be retried in the next iteration.
    • 如果在清理表的过程中发生错误,则会跳过该表,并将在下一次迭代中选取该表。If there is an error during cleanup of a table, that table is skipped and will be picked up in the next iteration.

手动清理Manual cleanup

自动清理线程可能无法在其运行期间完全删除所有已过时的行,具体取决于表上的数据保留设置以及数据库上工作负载的性质。Depending on the data retention settings on a table and the nature of the workload on the database, it's possible that the automatic cleanup thread may not completely remove all obsolete rows during its run. 为了解决此问题并允许用户手动删除过时的行,Azure SQL Edge 中引入了 sys.sp_cleanup_data_retention 存储过程。To assist with this and allow users to manually remove obsolete rows, the sys.sp_cleanup_data_retention stored procedure has been introduced in Azure SQL Edge.

此存储过程采用三个参数。This stored procedure takes three parameters. - 架构名称 - 表自身的架构的名称。Schema Name - Name of the owning schema for the table. 这是必需参数。This is a required parameter. - 表名称 - 正在为其运行手动清理的表的名称。Table Name - Name of the table for which manual cleanup is being run. 这是必需参数。This is a required parameter. - rowcount(输出)- 输出变量。rowcount (Output) - output variable. 返回由手动清理 sp 清理的行数。Returns the number of rows cleaned up by the manual cleanup sp. 这是一个可选参数。This is an optional parameter.

以下示例演示了如何为表 dbo.data_retention_table 执行手动清理 sp。The following example shows the execution of the manual cleanup sp for table dbo.data_retention_table.

declare @rowcnt bigint 
EXEC sys.sp_cleanup_data_retention 'dbo', 'data_retention_table', @rowcnt output 
select @rowcnt 

如何删除已过时的行How obsolete rows are deleted

清理过程取决于表的索引布局。The cleanup process depends on the index layout of the table. 对于具有有限保留期的所有表,系统会创建一个后台任务来执行已过时数据的清理。A background task is created to perform obsolete data cleanup for all tables with finite retention period. 行存储(B 树或堆)索引的清理逻辑以较小的区块(最大 10K)删除陈旧行,从而可以最大程度地减轻数据库日志和 IO 子系统的压力。Clean up logic for the rowstore (B-tree or Heap) index deletes aged row in smaller chunks (up to 10K) minimizing pressure on database log and IO subsystem. 虽然清理逻辑使用要求的 B 树索引,但无法完全保证删除超过保留期的行的顺序。Although cleanup logic utilizes required B-tree index, order of deletions for the rows older than retention period cannot be firmly guaranteed. 因此,请不要对应用程序中的清理顺序有任何期待 。Hence, do not take any dependency on the cleanup order in your applications.

聚集列存储的清理任务可同时移除整个行组(每组通常包含 100 万行),效率非常高,在数据高速生成和老化时尤其如此。The cleanup task for the clustered columnstore removes entire row groups at once (typically contain 1 million of rows each), which is very efficient, especially when data is generated and ages out at a high pace.

数据保留清理

聚集列存储索引具有优秀的数据压缩和高效的保留清理能力,是工作负载快速生成大量数据时的最佳选择。Excellent data compression and efficient retention cleanup makes clustered columnstore index a perfect choice for scenarios when your workload rapidly generates high amount of data.

备注

对于 B 树索引和堆,数据保留会对基础表运行删除查询,这可能会与表的删除触发器相冲突。In the case of B-Tree Indexes and heaps, data retention runs a delete query on the underlying tables, which can conflict with delete triggers on the tables. 建议删除表中的删除触发器,或不启用具有删除 DML 触发器的表的数据保留。It is recommended to either remove delete triggers from the tables or to not enable data retention on tables that have delete DML trigger.

监视数据保留清理Monitoring data retention cleanup

可以使用 Azure SQL Edge 中的扩展事件 (XEvents) 监视数据保留策略清理操作。Data retention policy cleanup operations can be monitored using extended events (XEvents) in Azure SQL Edge. 有关扩展事件的详细信息,请参阅 XEvents 概述For more information on extended events, refer XEvents Overview.

以下六个扩展事件有助于跟踪清理操作的状态。The following six extended events help track the state of the cleanup operations.

名称Name 说明Description
data_retention_task_starteddata_retention_task_started 在使用保留策略清理表的后台任务开始时发生。Occurs when background task for cleanup of tables with retention policy starts.
data_retention_task_completeddata_retention_task_completed 在使用保留策略清理表的后台任务结束时发生。Occurs when background task for cleanup of tables with retention policy ends.
data_retention_task_exceptiondata_retention_task_exception 在特定于表的保留清理过程外,使用保留策略清理表的后台任务失败时发生。Occurs when background task for cleanup of tables with retention policy fails outside of retention cleanup process specific to table.
data_retention_cleanup_starteddata_retention_cleanup_started 在使用数据保留策略的表的清理过程开始时发生。Occurs when clean up process of table with data retention policy starts.
data_retention_cleanup_exceptiondata_retention_cleanup_exception 在使用保留策略清理表的过程失败时发生。Occurs cleanup process of table with retention policy fails.
data_retention_cleanup_completeddata_retention_cleanup_completed 在使用数据保留策略的表的清理过程结束时发生。Occurs when clean up process of table with data retention policy ends.

此外,sys.dm_os_ring_buffers 动态管理视图中添加了名为 RING_BUFFER_DATA_RETENTION_CLEANUP 的新环形缓冲区类型。Additionally, a new ring buffer type named RING_BUFFER_DATA_RETENTION_CLEANUP has been added to sys.dm_os_ring_buffers dynamic management view. 此视图可用于监视数据保留清理操作。This view can be used to monitor the data retention cleanup operations.

后续步骤Next Steps