教程:使用事件网格和 Azure Functions 将捕获的事件中心数据迁移到 SQL 数据仓库Tutorial: Migrate captured Event Hubs data to a SQL Data Warehouse using Event Grid and Azure Functions

若要将事件中心的流式处理数据自动传递到 Azure Blob 存储或 Azure Data Lake Store,最容易的方式是使用事件中心捕获Event Hubs Capture is the easiest way to automatically deliver streamed data in Event Hubs to an Azure Blob storage or Azure Data Lake store. 可以随后处理数据并将其传递到所选的任何其他存储目标,例如 SQL 数据仓库或 Cosmos DB。You can subsequently process and deliver the data to any other storage destinations of your choice, such as SQL Data Warehouse or Cosmos DB. 本教程介绍如何使用事件网格触发的 Azure 函数将事件中心的数据捕获到 SQL 数据仓库中。In this tutorial, you learn how you to capture data from your event hub into a SQL data warehouse by using an Azure function triggered by an event grid.

Visual Studio

  • 首先,使用启用的捕获功能创建一个事件中心并将 Azure Blob 存储设置为目标。First, you create an event hub with the Capture feature enabled and set an Azure blob storage as the destination. WindTurbineGenerator 生成的数据会流式传输到事件中心,然后以 Avro 文件形式自动捕获到 Azure 存储中。Data generated by WindTurbineGenerator is streamed into the event hub and is automatically captured into Azure Storage as Avro files.
  • 接下来,请创建一个 Azure 事件网格订阅,使用事件中心命名空间作为其源,使用 Azure Function 终结点作为其目标。Next, you create an Azure Event Grid subscription with the Event Hubs namespace as its source and the Azure Function endpoint as its destination.
  • 通过事件中心捕获功能将新的 Avro 文件传递到 Azure 存储 Blob 时,事件网格就会通过 Blob URI 通知 Azure Function。Whenever a new Avro file is delivered to the Azure Storage blob by the Event Hubs Capture feature, Event Grid notifies the Azure Function with the blob URI. 然后,Function 会将数据从 Blob 迁移到 SQL 数据仓库。The Function then migrates data from the blob to a SQL data warehouse.

在本教程中,将执行以下操作:In this tutorial, you do the following actions:

  • 部署基础结构Deploy the infrastructure
  • 将代码发布到 Functions 应用Publish code to a Functions App
  • 通过 Functions 应用创建事件网格订阅Create an Event Grid subscription from the Functions app
  • 将示例数据流式传输到事件中心。Stream sample data into Event Hub.
  • 在 SQL 数据仓库中验证捕获的数据Verify captured data in SQL Data Warehouse

先决条件Prerequisites

备注

本文进行了更新,以便使用新的 Azure PowerShell Az 模块。This article has been updated to use the new Azure PowerShell Az module. 你仍然可以使用 AzureRM 模块,至少在 2020 年 12 月之前,它将继续接收 bug 修补程序。You can still use the AzureRM module, which will continue to receive bug fixes until at least December 2020. 若要详细了解新的 Az 模块和 AzureRM 兼容性,请参阅新 Azure Powershell Az 模块简介To learn more about the new Az module and AzureRM compatibility, see Introducing the new Azure PowerShell Az module. 有关 Az 模块安装说明,请参阅安装 Azure PowerShellFor Az module installation instructions, see Install Azure PowerShell.

  • Visual Studio 2019Visual studio 2019. 在安装时,请确保安装以下工作负荷:.NET 桌面开发、Azure 开发、ASP.NET 和 Web 开发、Node.js 开发、Python 开发While installing, ensure that you install the following workloads: .NET desktop development, Azure development, ASP.NET and web development, Node.js development, Python development

  • 下载 Git 示例。该示例解决方案包含以下组件:Download the Git sample The sample solution contains the following components:

    • WindTurbineDataGenerator – 一个简单的发布服务器,可以将示例性的风力涡轮机数据发送到启用了捕获功能的事件中心WindTurbineDataGenerator – A simple publisher that sends sample wind turbine data to a Capture-enabled event hub
    • FunctionDWDumper – 一个 Azure Function,可以在 Avro 文件捕获到 Azure 存储 Blob 时接收事件网格通知。FunctionDWDumper – An Azure Function that receives an Event Grid notification when an Avro file is captured to the Azure Storage blob. 它接收 Blob 的 URI 路径、读取其内容并将该数据推送到 SQL 数据仓库。It receives the blob’s URI path, reads its contents, and pushes this data to a SQL Data Warehouse.

    此示例使用最新的 Azure.Messaging.EventHubs 包。This sample uses the latest Azure.Messaging.EventHubs package. 可在此处找到使用 Microsoft.Azure.EventHubs 包的旧示例。You can find the old sample that uses the Microsoft.Azure.EventHubs package here.

部署基础结构Deploy the infrastructure

使用 Azure PowerShell 或 Azure CLI 通过此 Azure 资源管理器模板部署本教程所需的基础结构。Use Azure PowerShell or Azure CLI to deploy the infrastructure needed for this tutorial by using this Azure Resource Manager template. 此模板可创建以下资源:This template creates the following resources:

  • 已启用捕获功能的事件中心Event Hub with the Capture feature enabled
  • 适用于已捕获事件数据的存储帐户Storage account for the captured event data
  • 用于托管 Functions 应用的 Azure 应用服务计划Azure app service plan for hosting the Functions app
  • 用于处理已捕获事件文件的 Function AppFunction app for processing captured event files
  • 用于托管数据仓库的 SQL ServerSQL Server for hosting the Data Warehouse
  • 用于存储已迁移数据的 SQL 数据仓库SQL Data Warehouse for storing the migrated data

以下部分提供的 Azure CLI 和 Azure PowerShell 命令适用于部署本教程所需的基础结构。The following sections provide Azure CLI and Azure PowerShell commands for deploying the infrastructure required for the tutorial. 在运行这些命令之前,请更新以下对象的名称:Update names of the following objects before running the commands:

  • Azure 资源组Azure resource group
  • 资源组的区域Region for the resource group
  • 事件中心命名空间Event Hubs namespace
  • 事件中心Event hub
  • Azure SQL ServerAzure SQL server
  • SQL 用户(和密码)SQL user (and password)
  • Azure SQL 数据库Azure SQL database
  • Azure 存储Azure Storage
  • Azure Functions 应用Azure Functions App

这些脚本需要一定的时间来创建所有 Azure 项目。These scripts take some time to create all the Azure artifacts. 请等待脚本完成,然后再继续操作。Wait until the script completes before proceeding further. 如果部署因某种原因而失败,请删除资源组,修复报告的问题,然后重新运行命令。If the deployment fails for some reason, delete the resource group, fix the reported issue, and rerun the command.

Azure CLIAzure CLI

若要通过 Azure CLI 部署该模板,请使用以下命令:To deploy the template using Azure CLI, use the following commands:

az group create -l chinanorth -n rgDataMigrationSample

az group deployment create `
  --resource-group rgDataMigrationSample `
  --template-uri https://raw.githubusercontent.com/Azure/azure-docs-json-samples/master/event-grid/EventHubsDataMigration.json `
  --parameters eventHubNamespaceName=<event-hub-namespace> eventHubName=hubdatamigration sqlServerName=<sql-server-name> sqlServerUserName=<user-name> sqlServerPassword=<password> sqlServerDatabaseName=<database-name> storageName=<unique-storage-name> functionAppName=<app-name>

Azure PowerShellAzure PowerShell

若要通过 PowerShell 部署该模板,请使用以下命令:To deploy the template using PowerShell, use the following commands:

New-AzResourceGroup -Name rgDataMigration -Location chinaeast

New-AzResourceGroupDeployment -ResourceGroupName rgDataMigration -TemplateUri https://raw.githubusercontent.com/Azure/azure-docs-json-samples/master/event-grid/EventHubsDataMigration.json -eventHubNamespaceName <event-hub-namespace> -eventHubName hubdatamigration -sqlServerName <sql-server-name> -sqlServerUserName <user-name> -sqlServerDatabaseName <database-name> -storageName <unique-storage-name> -functionAppName <app-name>

在 SQL 数据仓库中创建表Create a table in SQL Data Warehouse

在门户中使用 Visual StudioSQL Server Management Studio 或查询编辑器,通过运行 CreateDataWarehouseTable.sql 脚本在 SQL 数据仓库中创建表。Create a table in your SQL data warehouse by running the CreateDataWarehouseTable.sql script using Visual Studio, SQL Server Management Studio, or the Query Editor in the portal.

CREATE TABLE [dbo].[Fact_WindTurbineMetrics] (
    [DeviceId] nvarchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, 
    [MeasureTime] datetime NULL, 
    [GeneratedPower] float NULL, 
    [WindSpeed] float NULL, 
    [TurbineSpeed] float NULL
)
WITH (CLUSTERED COLUMNSTORE INDEX, DISTRIBUTION = ROUND_ROBIN);

将代码发布到 Functions 应用Publish code to the Functions App

  1. 在 Visual Studio 2019 中打开解决方案 EventHubsCaptureEventGridDemo.sln 。Open the solution EventHubsCaptureEventGridDemo.sln in Visual Studio 2019.

  2. 在“解决方案资源管理器”中,右键单击“FunctionEGDWDumper” ,再选择“发布” 。In Solution Explorer, right-click FunctionEGDWDumper, and select Publish.

    发布函数应用

  3. 选择“Azure Function App” ,再选中“选择现有” 。Select Azure Function App and Select Existing. 选择“发布” 。Select Publish.

    目标函数应用

  4. 选择通过模板部署的函数应用。Select the function app that you deployed through the template. 选择“确定” 。Select OK.

    选择函数应用

  5. 在 Visual Studio 配置好配置文件后,选择“发布” 。When Visual Studio has configured the profile, select Publish.

    选择发布

发布函数后,即可订阅事件中心的捕获事件!After publishing the function, you are ready to subscribe to the capture event from Event Hubs!

通过 Functions 应用创建事件网格订阅Create an Event Grid subscription from the Functions app

  1. 转到 Azure 门户Go to the Azure portal. 选择资源组和函数应用。Select your resource group and function app.

    查看函数应用

  2. 选择函数。Select the function.

    选择函数

  3. 选择“添加事件网格订阅” 。Select Add Event Grid subscription.

    添加订阅

  4. 为事件网格订阅指定名称。Give the event grid subscription a name. 使用“事件中心命名空间” 作为事件类型。Use Event Hubs Namespaces as the event type. 提供用于选择事件中心命名空间实例的值。Provide values to select your instance of the Event Hubs namespace. 将订阅服务器终结点保留为提供的值。Leave the subscriber endpoint as the provided value. 选择“创建” 。Select Create.

    创建订阅

生成示例数据Generate sample data

现在已设置事件中心、SQL 数据仓库、Azure Function App 和事件网格订阅。You have now set up your Event Hub, SQL data warehouse, Azure Function App, and Event Grid subscription. 在源代码中更新连接字符串和事件中心以后,即可运行 WindTurbineDataGenerator.exe 以生成发往事件中心的数据流。You can run WindTurbineDataGenerator.exe to generate data streams to the Event Hub after updating connection string and name of your event hub in the source code.

  1. 在门户中,选择“事件中心”命名空间。In the portal, select your event hub namespace. 选择“连接字符串” 。Select Connection Strings.

    选择“连接字符串”

  2. 选择“RootManageSharedAccessKey” Select RootManageSharedAccessKey

    选择密钥

  3. 复制“连接字符串 - 主密钥” Copy Connection string - primary Key

    复制密钥

  4. 返回到 Visual Studio 项目。Go back to your Visual Studio project. WindTurbineDataGenerator 项目中,打开 program.csIn the WindTurbineDataGenerator project, open program.cs.

  5. EventHubConnectionStringEventHubName 的值更新为事件中心的连接字符串和名称。Update values for EventHubConnectionString and EventHubName with connection string and name of your event hub.

    private const string EventHubConnectionString = "Endpoint=sb://demomigrationnamespace.servicebus.chinacloudapi.cn/...";
    private const string EventHubName = "hubdatamigration";
    
  6. 生成解决方案,然后运行 WindTurbineGenerator.exe 应用程序。Build the solution, then run the WindTurbineGenerator.exe application.

在数据仓库中验证捕获的数据Verify captured data in data warehouse

几分钟后,查询 SQL 数据仓库中的表。After a couple of minutes, query the table in your SQL data warehouse. 可以看到 WindTurbineDataGenerator 生成的数据已被 Azure Function 流式传输到事件中心,接着捕获到 Azure 存储容器中,然后迁移到 SQL 数据仓库。You observe that the data generated by WindTurbineDataGenerator has been streamed to your Event Hub, captured into an Azure Storage container, and then migrated into the SQL Data Warehouse table by Azure Function.

后续步骤Next steps

可以将强大的数据可视化工具与数据仓库配合使用,以便获取可行的见解。You can use powerful data visualization tools with your data warehouse to achieve actionable insights.

本文介绍如何将 Power BI 与 SQL 数据仓库配合使用This article shows how to use Power BI with SQL Data Warehouse