创建运行管道的触发器来响应存储事件Create a trigger that runs a pipeline in response to a storage event

适用于: Azure 数据工厂 Azure Synapse Analytics

本文介绍了可以在数据工厂管道中创建的存储事件触发器。This article describes the Storage Event Triggers that you can create in your Data Factory pipelines.

事件驱动的体系结构 (EDA) 是一种常见数据集成模式,其中涉及到事件的生成、检测、消耗和响应。Event-driven architecture (EDA) is a common data integration pattern that involves production, detection, consumption, and reaction to events. 数据集成方案通常要求数据工厂客户根据存储帐户中发生的事件(例如,文件抵达了 Azure Blob 存储帐户,或者在存储帐户中删除了文件)触发管道。Data integration scenarios often require Data Factory customers to trigger pipelines based on events happening in storage account, such as the arrival or deletion of a file in Azure Blob Storage account. 数据工厂以本机方式与 Azure 事件网格集成,这样便可以根据此类事件触发管道。Data Factory natively integrates with Azure Event Grid, which lets you trigger pipelines on such events.

备注

本文中所介绍的集成依赖于 Azure 事件网格The integration described in this article depends on Azure Event Grid. 请确保订阅已注册事件网格资源提供程序。Make sure that your subscription is registered with the Event Grid resource provider. 有关详细信息,请参阅资源提供程序和类型For more info, see Resource providers and types. 你必须能够执行 Microsoft.EventGrid/eventSubscriptions/* 操作。You must be able to do the Microsoft.EventGrid/eventSubscriptions/* action. 此操作属于 EventGrid EventSubscription 参与者内置角色。This action is part of the EventGrid EventSubscription Contributor built-in role.

数据工厂 UIData Factory UI

本部分介绍如何在 Azure 数据工厂用户界面中创建存储事件触发器。This section shows you how to create a storage event trigger within the Azure Data Factory User Interface.

  1. 切换到“编辑”选项卡(显示为铅笔符号)。Switch to the Edit tab, shown with a pencil symbol.

  2. 在菜单上选择“触发器”,然后选择“新建/编辑” 。Select Trigger on the menu, then select New/Edit.

  3. 在“添加触发器”页上,选择“选择触发器...”,然后选择“+新建” 。On the Add Triggers page, select Choose trigger..., then select +New.

  4. 选择触发器类型“存储事件”Select trigger type Storage Event

    “创作”页(用于在数据工厂 UI 中创建新的存储事件触发器)的屏幕截图。

  5. 从 Azure 订阅下拉列表中选择你的存储帐户,或使用其存储帐户资源 ID 手动选择。Select your storage account from the Azure subscription dropdown or manually using its Storage account resource ID. 选择希望事件在哪个容器中发生。Choose which container you wish the events to occur on. 需选择容器,但请注意,选择所有容器可能会导致发生大量事件。Container selection is required, but be mindful that selecting all containers can lead to a large number of events.

    备注

    存储事件触发器目前仅支持 Azure Data Lake Storage Gen2 和“常规用途”版本 2 存储帐户。The Storage Event Trigger currently supports only Azure Data Lake Storage Gen2 and General-purpose version 2 storage accounts. 由于 Azure 事件网格的限制,对于每个存储帐户,Azure 数据工厂最多仅支持 500 个存储事件触发器。Due to an Azure Event Grid limitation, Azure Data Factory only supports a maximum of 500 storage event triggers per storage account.

    备注

    若要新建存储事件触发器或修改现有的存储事件触发器,用于登录到数据工厂并发布存储事件触发器的 Azure 帐户必须对存储帐户具有相应的基于角色的访问控制 (Azure RBAC) 权限。To create a new or modify an existing Storage Event Trigger, the Azure account used to log into Data Factory and publish the storage event trigger must have appropriate role based access control (Azure RBAC) permission on the storage account. 不需要其他权限:Azure 数据工厂的服务主体不需要拥有对存储帐户或事件网格的特殊权限。No additional permission is required: Service Principal for the Azure Data Factory does not need special permission to either the Storage account or Event Grid. 有关访问控制的详细信息,请参阅基于角色的访问控制部分。For more information about access control, see Role based access control section.

  6. 使用“Blob 路径开头为”和“Blob路径结尾为”属性,可以指定要为其接收事件的容器、文件夹和 Blob 的名称。 The Blob path begins with and Blob path ends with properties allow you to specify the containers, folders, and blob names for which you want to receive events. 存储事件触发器要求至少定义其中的一个属性。Your storage event trigger requires at least one of these properties to be defined. 可以为“Blob 路径开头为”和“Blob路径结尾为”属性使用各种模式,如本文中后面的示例所示。You can use variety of patterns for both Blob path begins with and Blob path ends with properties, as shown in the examples later in this article.

    • Blob 路径开头为: Blob 路径必须以文件夹路径开头。Blob path begins with: The blob path must start with a folder path. 有效值包括 2018/2018/april/shoes.csvValid values include 2018/ and 2018/april/shoes.csv. 如果未选择容器,则无法选择此字段。This field can't be selected if a container isn't selected.
    • Blob 路径结尾为: Blob 路径必须以文件名或扩展名结尾。Blob path ends with: The blob path must end with a file name or extension. 有效值包括 shoes.csv.csvValid values include shoes.csv and .csv. 指定容器和文件夹名称时,它们必须由 /blobs/ 段分隔。Container and folder names, when specified, they must be separated by a /blobs/ segment. 例如,名为“orders”的容器可以使用 /orders/blobs/2018/april/shoes.csv 值。For example, a container named 'orders' can have a value of /orders/blobs/2018/april/shoes.csv. 若要指定任何容器中的某个文件夹,请省略前导“/”字符。To specify a folder in any container, omit the leading '/' character. 例如,april/shoes.csv 将会针对任何容器中名为“april”的文件夹内名为 shoes.csv 的任何文件触发事件。For example, april/shoes.csv will trigger an event on any file named shoes.csv in folder a called 'april' in any container.
    • 请注意,只有“Blob 路径开头为”和“Blob 路径结尾为”是存储事件触发器中允许的模式匹配。Note that Blob path begins with and ends with are the only pattern matching allowed in Storage Event Trigger. 触发器类型不支持其他类型的通配符匹配。Other types of wildcard matching aren't supported for the trigger type.
  7. 选择触发器是要响应“已创建 Blob”事件、“已删除 Blob”事件,还是同时响应这两者。 Select whether your trigger will respond to a Blob created event, Blob deleted event, or both. 在指定的存储位置中,每个事件将触发与触发器关联的数据工厂管道。In your specified storage location, each event will trigger the Data Factory pipelines associated with the trigger.

    “存储事件触发器创建”页的屏幕截图。

  8. 选择触发器是否忽略零字节的 Blob。Select whether or not your trigger ignores blobs with zero bytes.

  9. 配置触发器后,单击“下一步: 数据预览”。After you configure you trigger, click on Next: Data preview. 此屏幕显示事件触发器配置匹配的现有 Blob。This screen shows the existing blobs matched by your storage event trigger configuration. 请确保使用具体化的筛选器。Make sure you've specific filters. 如果配置的筛选器过于宽泛,可能会匹配已创建/删除的大量文件,并可能会严重影响成本。Configuring filters that are too broad can match a large number of files created/deleted and may significantly impact your cost. 验证筛选条件后,单击“完成”。Once your filter conditions have been verified, click Finish.

    “存储事件触发器预览”页的屏幕截图。

  10. 若要将管道附加到此触发器,请转到管道画布,然后单击“触发器”并选择“新建/编辑”。 To attach a pipeline to this trigger, go to the pipeline canvas and click Trigger and select New/Edit. 出现边侧导航栏时,单击“选择触发器...”下拉列表,然后选择创建的触发器。When the side nav appears, click on the Choose trigger... dropdown and select the trigger you created. 单击“下一步:数据预览”确认配置是否正确,然后单击“下一步”验证数据预览是否正确。Click Next: Data preview to confirm the configuration is correct and then Next to validate the Data preview is correct.

  11. 如果管道具有参数,则你可以在触发器运行参数边侧导航栏中指定这些参数。If your pipeline has parameters, you can specify them on the trigger runs parameter side nav. 存储事件触发器将 Blob 的文件夹路径和文件名捕获到属性 @triggerBody().folderPath@triggerBody().fileName 中。The storage event trigger captures the folder path and file name of the blob into the properties @triggerBody().folderPath and @triggerBody().fileName. 若要在管道中使用这些属性的值,必须将这些属性映射至管道参数。To use the values of these properties in a pipeline, you must map the properties to pipeline parameters. 将这些属性映射至参数后,可以通过管道中的 @pipeline().parameters.parameterName 表达式访问由触发器捕获的值。After mapping the properties to parameters, you can access the values captured by the trigger through the @pipeline().parameters.parameterName expression throughout the pipeline. 有关详细说明,请参阅在管道中引用触发器元数据For detailed explanation, see Reference Trigger Metadata in Pipelines

    将属性映射到管道参数的存储事件触发器的屏幕截图。

    在前面的示例中,触发器配置为在容器 sample-data 中的文件夹 event-testing 内创建了以 .csv 结尾的 Blob 路径时触发。 In the preceding example, the trigger is configured to fire when a blob path ending in .csv is created in the folder event-testing in the container sample-data. folderPathfileName 属性捕获新 Blob 的位置。The folderPath and fileName properties capture the location of the new blob. 例如,将 MoviesDB.csv 添加到路径 sample-data/event-testing 时,@triggerBody().folderPath 的值为 sample-data/event-testing@triggerBody().fileName 的值为 moviesDB.csvFor example, when MoviesDB.csv is added to the path sample-data/event-testing, @triggerBody().folderPath has a value of sample-data/event-testing and @triggerBody().fileName has a value of moviesDB.csv. 示例中的这些值将映射到管道参数 sourceFoldersourceFile,这两个参数在整个管道中分别可以用作 @pipeline().parameters.sourceFolder@pipeline().parameters.sourceFileThese values are mapped, in the example, to the pipeline parameters sourceFolder and sourceFile, which can be used throughout the pipeline as @pipeline().parameters.sourceFolder and @pipeline().parameters.sourceFile respectively.

  12. 完成后,单击“完成”。Click Finish once you are done.

JSON 架构JSON schema

下表概述了与存储事件触发器相关的架构元素:The following table provides an overview of the schema elements that are related to storage event triggers:

JSON 元素JSON Element 说明Description 类型Type 允许的值Allowed Values 必需Required
作用域scope 存储帐户的 Azure 资源管理器资源 ID。The Azure Resource Manager resource ID of the Storage Account. 字符串String Azure 资源管理器 IDAzure Resource Manager ID Yes
eventsevents 导致此触发器触发的事件的类型。The type of events that cause this trigger to fire. ArrayArray Microsoft.Storage.BlobCreated、Microsoft.Storage.BlobDeletedMicrosoft.Storage.BlobCreated, Microsoft.Storage.BlobDeleted 是的,这些值的任意组合。Yes, any combination of these values.
blobPathBeginsWithblobPathBeginsWith blob 路径必须使用为要触发的触发器提供的模式开头。The blob path must begin with the pattern provided for the trigger to fire. 例如,/records/blobs/december/ 只会触发 records 容器下 december 文件夹中的 blob 触发器。For example, /records/blobs/december/ only fires the trigger for blobs in the december folder under the records container. 字符串String 必须为以下至少一个属性提供值:blobPathBeginsWithblobPathEndsWithProvide a value for at least one of these properties: blobPathBeginsWith or blobPathEndsWith.
blobPathEndsWithblobPathEndsWith blob 路径必须使用为要触发的触发器提供的模式结尾。The blob path must end with the pattern provided for the trigger to fire. 例如,december/boxes.csv 只会触发 december 文件夹中名为 boxes 的 blob 的触发器。For example, december/boxes.csv only fires the trigger for blobs named boxes in a december folder. 字符串String 必须为其中至少一个属性提供值:blobPathBeginsWithblobPathEndsWithYou have to provide a value for at least one of these properties: blobPathBeginsWith or blobPathEndsWith.
ignoreEmptyBlobsignoreEmptyBlobs 零字节 Blob 是否触发管道运行。Whether or not zero-byte blobs will trigger a pipeline run. 默认情况下,此元素设置为 true。By default, this is set to true. BooleanBoolean true 或 falsetrue or false No

存储事件触发器的示例Examples of storage event triggers

本部分提供了存储事件触发器设置的示例。This section provides examples of storage event trigger settings.

重要

每当指定容器和文件夹、容器和文件或容器、文件夹和文件时,都必须包含路径的 /blobs/ 段,如以下示例所示。You have to include the /blobs/ segment of the path, as shown in the following examples, whenever you specify container and folder, container and file, or container, folder, and file. 对于 blobPathBeginsWith,数据工厂 UI 将自动在触发器 JSON 中的文件夹与容器名称之间添加 /blobs/For blobPathBeginsWith, the Data Factory UI will automatically add /blobs/ between the folder and container name in the trigger JSON.

propertiesProperty 示例Example 说明Description
Blob 路径开头Blob path begins with /containername/ 接收容器中任何 blob 事件。Receives events for any blob in the container.
Blob 路径开头Blob path begins with /containername/blobs/foldername/ 接收 containername 容器和 foldername 文件夹中的任何 blob 事件。Receives events for any blobs in the containername container and foldername folder.
Blob 路径开头Blob path begins with /containername/blobs/foldername/subfoldername/ 此外可以引用一个子文件夹。You can also reference a subfolder.
Blob 路径开头Blob path begins with /containername/blobs/foldername/file.txt 接收 containername 容器下的 foldername 文件夹中名为 file.txt 的 blob 事件。Receives events for a blob named file.txt in the foldername folder under the containername container.
Blob 路径结尾Blob path ends with file.txt 接收任何路径中名为 file.txt 的 blob 事件。Receives events for a blob named file.txt in any path.
Blob 路径结尾Blob path ends with /containername/blobs/file.txt 接收容器 containername 下名为 file.txt 的 blob 事件。Receives events for a blob named file.txt under container containername.
Blob 路径结尾Blob path ends with foldername/file.txt 接收任何容器下 foldername 文件夹中名为 file.txt 的 blob 事件。Receives events for a blob named file.txt in foldername folder under any container.

基于角色的访问控制Role-based access control

Azure 数据工厂使用了 Azure 基于角色的访问控制 (Azure RBAC),确保严格禁止通过未经授权的访问来侦听 Blob 事件、订阅 Blob 事件更新以及触发链接到 Blob 事件的管道。Azure Data Factory uses Azure role-based access control (Azure RBAC) to ensure that unauthorized access to listen to, subscribe to updates from, and trigger pipelines linked to blob events, are strictly prohibited.

  • 若要成功新建存储事件触发器或更新现有的存储事件触发器,登录到数据工厂的 Azure 帐户需要具有对相关存储帐户的相应访问权限。To successfully create a new or update an existing Storage Event Trigger, the Azure account signed into the Data Factory needs to have appropriate access to the relevant storage account. 否则,操作将失败,并且 访问被拒绝Otherwise, the operation with fail with Access Denied.
  • 数据工厂不需要对事件网格具有特殊权限,并且你无需为操作的数据工厂服务主体分配特殊 RBAC 权限。Data Factory needs no special permission to your Event Grid, and you do not need to assign special RBAC permission to Data Factory service principal for the operation.

以下任何 RBAC 设置均适用于存储事件触发器:Any of following RBAC settings works for storage event trigger:

  • 存储帐户的所有者角色Owner role to the storage account
  • 存储帐户的参与者角色Contributor role to the storage account
  • /subscriptions/####/resourceGroups/####/providers/Microsoft.Storage/storageAccounts/storageAccountName 存储帐户的 Microsoft.EventGrid/EventSubscriptions/Write 权限Microsoft.EventGrid/EventSubscriptions/Write permission to storage account /subscriptions/####/resourceGroups/####/providers/Microsoft.Storage/storageAccounts/storageAccountName

为了解 Azure 数据工厂如何实现这两项承诺,让我们放宽视野,看看幕后的流程。In order to understand how Azure Data Factory delivers the two promises, let's take back a step and take a sneak peek behind the scene. 下面是有关数据工厂、存储和事件网格之间的集成的概要工作流。Here are the high-level work flows for integration among Data Factory, Storage, and Event Grid.

新建存储事件触发器Create a new Storage Event Trigger

此概要工作流介绍 Azure 数据工厂如何与事件网格交互,以创建存储事件触发器This high-level work flow describes how Azure Data Factory interacts with Event Grid to create a Storage Event Trigger

创建存储事件触发器的工作流。

工作流有两个值得注意之处:Two noticeable call outs from the work flows:

  • Azure 数据工厂不会直接与存储帐户联系。Azure Data Factory makes no direct contact with Storage account. 创建订阅的请求将通过事件网格进行中继和处理。Request to create a subscription is instead relayed and processed by Event Grid. 因此,对于此步骤,数据工厂不需要存储帐户的权限。Hence, Data Factory needs no permission to Storage account for this step.

  • 访问控制和权限检查发生在 Azure 数据工厂端。Access control and permission checking happen on Azure Data Factory side. 在 ADF 发送订阅存储事件的请求之前,它会检查用户的权限。Before ADF sends a request to subscribe to storage event, it checks the permission for the user. 更具体地讲,它会检查已登录并在尝试创建存储事件触发器的 Azure 帐户是否具有访问相关存储帐户的相应权限。More specifically, it checks whether the Azure account signed in and attempting to create the Storage Event trigger has appropriate access to the relevant storage account. 如果权限检查失败,触发器创建也会失败。If the permission check fails, trigger creation also fails.

存储事件触发数据工厂管道运行Storage event trigger Data Factory pipeline run

此概要工作流介绍了存储事件如何通过事件网格触发管道运行This high-level work flows describe how Storage event triggers pipeline run through Event Grid

触发管道运行的存储事件的工作流。

提到数据工厂中的事件触发管道时,工作流中有三个值得注意之处:When it comes to Event triggering pipeline in Data Factory, three noticeable call outs in the workflow:

  • 事件网格使用一个推送模型,当存储将消息放入系统中时,该模型会尽快中继消息。Event Grid uses a Push model that it relays the message as soon as possible when storage drops the message into the system. 这与消息传送系统(例如使用了拉取系统的 Kafka)不同。This is different from messaging system, such as Kafka where a Pull system is used.

  • Azure 数据工厂上的事件触发器充当传入消息的活动侦听器,并且会正确触发关联的管道。Event Trigger on Azure Data Factory serves as an active listener to the incoming message and it properly triggers the associated pipeline.

  • 存储事件触发器本身不与存储帐户建立直接联系Storage Event Trigger itself makes no direct contact with Storage account

    • 也就是说,如果管道中有一个用于处理存储帐户中数据的“复制”活动或其他活动,数据工厂将使用存储在链接服务中的凭据与存储建立直接联系。That said, if you have a Copy or other activity inside the pipeline to process the data in Storage account, Data Factory will make direct contact with Storage, using the credentials stored in the Linked Service. 确保正确设置了链接服务Ensure that Linked Service is set up appropriately
    • 不过,如果管道中未涉及到存储帐户,则无需授予数据工厂访问存储帐户的权限However, if you make no reference to the Storage account in the pipeline, you do not need to grant permission to Data Factory to access Storage account

后续步骤Next steps