使用 Azure 门户在 Azure 数据资源管理器群集上配置流式引入Configure streaming ingestion on your Azure Data Explorer cluster using the Azure portal

当需要降低引入和查询之间的延迟时,请使用流式引入来加载数据。Use streaming ingestion to load data when you need low latency between ingestion and query. 流式引入操作会在 10 秒内完成,完成后数据立即可用于查询。The streaming ingestion operation completes in under 10 seconds, and your data is immediately available for query after completion. 这种引入方法适合引入大量数据,如每秒数千条记录(分布在数千个表中)。This ingestion method is suitable for ingesting a high volume of data, such as thousands of records per second, spread over thousands of tables. 每个表接收的数据量相对较少,如每秒几条记录。Each table receives a relatively low volume of data, such as a few records per second.

当每个表每小时引入的数据量超过 4 GB 时,请使用批量引入而不是流式引入。Use bulk ingestion instead of streaming ingestion when the amount of data ingested exceeds 4 GB per hour per table.

若要详细了解各种引入方法,请参阅数据引入概述To learn more about different ingestion methods, see data ingestion overview.

先决条件Prerequisites

在群集上启用流式引入Enable streaming ingestion on your cluster

在 Azure 门户中创建新群集时启用流式引入Enable streaming ingestion while creating a new cluster in the Azure portal

可以在创建新的 Azure 数据资源管理器群集时启用流式引入。You can enable streaming ingestion while creating a new Azure Data Explorer cluster.

在“配置”选项卡中,选择“流式引入” > “启用” 。In the Configurations tab, select Streaming ingestion > On.

在 Azure 数据资源管理器中创建群集时启用流式引入

在 Azure 门户中现有群集上启用流式引入Enable streaming ingestion on an existing cluster in the Azure portal

  1. 在 Azure 门户中,转到 Azure 数据资源管理器群集。In the Azure portal, go to your Azure Data Explorer cluster.

  2. 在“设置”中选择“配置”。 In Settings, select Configurations.

  3. 在“配置”窗格中,选择“打开”以启用“流式引入”。 In the Configurations pane, select On to enable Streaming ingestion.

  4. 选择“保存” 。Select Save.

    在 Azure 数据资源管理器中创建群集时启用流式引入

警告

启用流式引入之前,请查看限制Review the limitations prior to enabling steaming ingestion.

在 Azure 门户中创建目标表并定义策略Create a target table and define the policy in the Azure portal

  1. 在 Azure 门户中导航到群集。In the Azure portal, navigate to your cluster.

  2. 选择“查询”。Select Query.

    在 Azure 数据资源管理器中创建群集时启用流式引入

  3. 将以下命令复制到“查询窗格”中,然后选择“运行”以创建将通过流式引入接收数据的表 。To create the table that will receive the data via streaming ingestion, copy the following command into the Query pane and select Run.

    .create table TestTable (TimeStamp: datetime, Name: string, Metric: int, Source:string)
    

    在 Azure 数据资源管理器中创建群集时启用流式引入

  4. 在已创建的表或包含此表的数据库上定义流式引入策略Define the streaming ingestion policy on the table you've created or on the database that contains this table.

    提示

    在数据库级别定义的策略将对数据库中的所有现有表和未来表应用相同的设置。A policy that is defined at the database level applies to all existing and future tables in the database.

  5. 将以下命令之一复制到“查询窗格”并选择“运行” 。Copy one of the following commands into the Query pane and select Run.

    .alter table TestTable policy streamingingestion enable
    

    or

    .alter database StreamingTestDb policy streamingingestion enable
    

    在 Azure 数据资源管理器中创建群集时启用流式引入

使用流式引入将数据引入到群集Use streaming ingestion to ingest data to your cluster

支持两种流式引入类型:Two streaming ingestion types are supported:

选择适当的流式引入类型Choose the appropriate streaming ingestion type

条件Criterion 事件中心Event Hub 自定义引入Custom Ingestion
启动引入之后、数据可供查询之前的数据延迟Data delay between ingestion initiation and the data available for query 延迟更长Longer delay 延迟更短Shorter delay
开发开销Development overhead 快速轻松的设置,不产生开发开销Fast and easy setup, no development overhead 应用程序需要处理错误并确保数据一致性,因此开发开销较高High development overhead for application to handle errors and ensure data consistency

在群集上禁用流式引入Disable streaming ingestion on your cluster

警告

禁用流式引入可能需要几个小时。Disabling streaming ingestion may take a few hours.

在 Azure 数据资源管理器群集上禁用流式引入之前,请从所有相关的表和数据库中删除流式引入策略Before disabling streaming ingestion on your Azure Data Explorer cluster, drop the streaming ingestion policy from all relevant tables and databases. 删除流式引入策略会触发 Azure 数据资源管理器群集内的数据重新排列。The removal of the streaming ingestion policy triggers data rearrangement inside your Azure Data Explorer cluster. 流式引入数据将从初始存储移到列存储中的永久存储(盘区或分片)。The streaming ingestion data is moved from the initial storage to permanent storage in the column store (extents or shards). 此过程可能需要几秒钟到几个小时,具体取决于初始存储中的数据量。This process can take between a few seconds to a few hours, depending on the amount of data in the initial storage.

在 Azure 门户中删除流式引入策略Drop the streaming ingestion policy in the Azure portal

  1. 在 Azure 门户中,转到 Azure 数据资源管理器群集,然后选择“查询”。In the Azure portal, go to your Azure Data Explorer cluster and select Query.

  2. 要从表中删除流引入策略,请将以下命令复制到“查询窗格”并选择“运行” 。To drop the streaming ingestion policy from the table, copy the following command into Query pane and select Run.

    .delete table TestTable policy streamingingestion 
    

    在 Azure 数据资源管理器中创建群集时启用流式引入

  3. 在“设置”中选择“配置”。 In Settings, select Configurations.

  4. 在“配置”窗格中,选择“关闭”以禁用“流式引入”。 In the Configurations pane, select Off to disable Streaming ingestion.

  5. 选择“保存” 。Select Save.

    在 Azure 数据资源管理器中创建群集时启用流式引入

限制Limitations

  • 如果数据库本身或其任何表已定义并启用了流式引入策略,则数据库不支持数据库游标Database cursors aren't supported for a database if the database itself or any of its tables have the Streaming ingestion policy defined and enabled.
  • 数据映射必须已预先创建,以便在流式引入中使用。Data mappings must be pre-created for use in streaming ingestion. 单个流式处理引入请求不包含内联数据映射。Individual streaming ingestion requests don't accommodate inline data mappings.
  • 增大 VM 和群集大小可以提高流式引入的性能和容量。Streaming ingestion performance and capacity scales with increased VM and cluster sizes. 并发引入请求数量限制为每个核心六个。The number of concurrent ingestion requests is limited to six per core. 例如,对于 16 核 SKU(如 D14 和 L16),支持的最大负载为 96 个并发引入请求。For example, for 16 core SKUs, such as D14 and L16, the maximal supported load is 96 concurrent ingestion requests. 对于双核 SKU(例如 D11),支持的最大负载为 12 个并发引入请求。For two core SKUs, such as D11, the maximal supported load is 12 concurrent ingestion requests.
  • 每个流式引入请求的数据大小限制为 4 MB。The data size limit for streaming ingestion request is 4 MB.
  • 对流式引入服务进行架构更新(例如创建和修改表与引入映射)最长可能需要花费 5 分钟时间。Schema updates, such as creation and modification of tables and ingestion mappings, may take up to five minutes for the streaming ingestion service. 有关详细信息,请参阅流式引入和架构更改For more information see Streaming ingestion and schema changes.
  • 即使数据不是流式引入的,在群集上启用流式引入也会占用群集计算机的一部分本地 SSD 磁盘用于存储流式引入数据,因此会减少热缓存的可用存储。Enabling streaming ingestion on a cluster, even when data isn't ingested via streaming, uses part of the local SSD disk of the cluster machines for streaming ingestion data and reduces the storage available for hot cache.
  • 盘区标记不能在流式引入数据上设置。Extent tags can't be set on the streaming ingestion data.

后续步骤Next steps