教程:使用 Azure 数据工厂在 HDInsight 中创建按需 Apache Hadoop 群集Tutorial: Create on-demand Apache Hadoop clusters in HDInsight using Azure Data Factory

本教程将介绍如何使用 Azure 数据工厂在 Azure HDInsight 中按需创建 Apache Hadoop 群集。In this tutorial, you learn how to create an Apache Hadoop cluster, on demand, in Azure HDInsight using Azure Data Factory. 然后使用 Azure 数据工厂中的数据管道运行 Hive 作业并删除该群集。You then use data pipelines in Azure Data Factory to run Hive jobs and delete the cluster. 本教程结束时,你便知道如何将大数据作业运行操作化,其中的群集创建、作业运行和群集删除操作都是按计划执行的。By the end of this tutorial, you learn how to operationalize a big data job run where cluster creation, job run, and cluster deletion are performed on a schedule.

本教程涵盖以下任务:This tutorial covers the following tasks:

  • 创建 Azure 存储帐户Create an Azure storage account
  • 了解 Azure 数据工厂活动Understand Azure Data Factory activity
  • 使用 Azure 门户创建数据工厂Create a data factory using Azure portal
  • 创建链接服务Create linked services
  • 创建管道Create a pipeline
  • 触发管道Trigger a pipeline
  • 监视管道Monitor a pipeline
  • 验证输出Verify the output

如果没有 Azure 订阅,请在开始前创建一个试用帐户If you don't have an Azure subscription, create a trial account before you begin.

先决条件Prerequisites

  • 已安装 PowerShell Az 模块The PowerShell Az Module installed.

  • 一个 Azure Active Directory 服务主体。An Azure Active Directory service principal. 创建服务主体后,请务必使用链接文章中的说明检索应用程序 ID身份验证密钥Once you've created the service principal, be sure to retrieve the application ID and authentication key using the instructions in the linked article. 在本教程后面的步骤中,需要用到这些值。You need these values later in this tutorial. 另外,请确保此服务主体是订阅“参与者”角色的成员,或创建群集的资源组的成员。 Also, make sure the service principal is a member of the Contributor role of the subscription or the resource group in which the cluster is created. 有关检索所需值和分配适当角色的说明,请参阅创建 Azure Active Directory 服务主体For instructions to retrieve the required values and assign the right roles, see Create an Azure Active Directory service principal.

创建初步 Azure 对象Create preliminary Azure objects

在本部分中,你将创建各种对象,以用于按需创建的 HDInsight 群集。In this section, you create various objects that will be used for the HDInsight cluster you create on-demand. 已创建的存储帐户将包含示例 HiveQL 脚本 (partitionweblogs.hql),该脚本用于模拟群集上运行的示例 Apache Hive 作业。The created storage account will contain the sample HiveQL script, partitionweblogs.hql, that you use to simulate a sample Apache Hive job that runs on the cluster.

本部分使用 Azure PowerShell 脚本创建存储帐户,并在存储帐户中复制所需的文件。This section uses an Azure PowerShell script to create the storage account and copy over the required files within the storage account. 本部分中的 Azure PowerShell 示例脚本执行以下任务:The Azure PowerShell sample script in this section performs the following tasks:

  1. 登录到 Azure。Signs in to Azure.
  2. 创建 Azure 资源组。Creates an Azure resource group.
  3. 创建 Azure 存储帐户。Creates an Azure Storage account.
  4. 在存储帐户中创建 Blob 容器Creates a Blob container in the storage account
  5. 将示例 HiveQL 脚本 (partitionweblogs.hql) 复制到 Blob 容器。Copies the sample HiveQL script (partitionweblogs.hql) the Blob container. https://hditutorialdata.blob.core.windows.net/adfhiveactivity/script/partitionweblogs.hql 上提供了该脚本。The script is available at https://hditutorialdata.blob.core.windows.net/adfhiveactivity/script/partitionweblogs.hql. 该示例脚本已在另一个公共 Blob 容器中提供。The sample script is already available in another public Blob container. 以下 PowerShell 脚本将这些文件复制到它创建的 Azure 存储帐户。The PowerShell script below makes a copy of these files into the Azure Storage account it creates.

创建存储帐户并复制文件Create storage account and copy files

重要

指定要通过脚本创建的 Azure 资源组和 Azure 存储帐户的名称。Specify names for the Azure resource group and the Azure storage account that will be created by the script. 记下脚本所输出的“资源组名称”、“存储帐户名称”和“存储帐户密钥”。 Write down resource group name, storage account name, and storage account key outputted by the script. 下一部分需要这些信息。You need them in the next section.

$resourceGroupName = "<Azure Resource Group Name>"
$storageAccountName = "<Azure Storage Account Name>"
$location = "China East"

$sourceStorageAccountName = "hditutorialdata"  
$sourceContainerName = "adfv2hiveactivity"

$destStorageAccountName = $storageAccountName
$destContainerName = "adfgetstarted" # don't change this value.

####################################
# Connect to Azure
####################################
#region - Connect to Azure subscription
Write-Host "`nConnecting to your Azure subscription ..." -ForegroundColor Green
$sub = Get-AzSubscription -ErrorAction SilentlyContinue
if(-not($sub))
{
    Connect-AzAccount
}

# If you have multiple subscriptions, set the one to use
# Select-AzSubscription -SubscriptionId "<SUBSCRIPTIONID>"

#endregion

####################################
# Create a resource group, storage, and container
####################################

#region - create Azure resources
Write-Host "`nCreating resource group, storage account and blob container ..." -ForegroundColor Green

New-AzResourceGroup `
    -Name $resourceGroupName `
    -Location $location

New-AzStorageAccount `
    -ResourceGroupName $resourceGroupName `
    -Name $destStorageAccountName `
    -Kind StorageV2 `
    -Location $location `
    -SkuName Standard_LRS `
    -EnableHttpsTrafficOnly 1

$destStorageAccountKey = (Get-AzStorageAccountKey `
    -ResourceGroupName $resourceGroupName `
    -Name $destStorageAccountName)[0].Value

$sourceContext = New-AzStorageContext `
    -StorageAccountName $sourceStorageAccountName `
    -Anonymous

$destContext = New-AzStorageContext `
    -StorageAccountName $destStorageAccountName `
    -StorageAccountKey $destStorageAccountKey

New-AzStorageContainer `
    -Name $destContainerName `
    -Context $destContext
#endregion

####################################
# Copy files
####################################
#region - copy files
Write-Host "`nCopying files ..." -ForegroundColor Green

$blobs = Get-AzStorageBlob `
    -Context $sourceContext `
    -Container $sourceContainerName `
    -Blob "hivescripts\hivescript.hql"

$blobs|Start-AzStorageBlobCopy `
    -DestContext $destContext `
    -DestContainer $destContainerName `
    -DestBlob "hivescripts\partitionweblogs.hql"

Write-Host "`nCopied files ..." -ForegroundColor Green
Get-AzStorageBlob `
    -Context $destContext `
    -Container $destContainerName
#endregion

Write-host "`nYou will use the following values:" -ForegroundColor Green
write-host "`nResource group name: $resourceGroupName"
Write-host "Storage Account Name: $destStorageAccountName"
write-host "Storage Account Key: $destStorageAccountKey"

Write-host "`nScript completed" -ForegroundColor Green

验证存储帐户Verify storage account

  1. 登录到 Azure 门户Sign on to the Azure portal.
  2. 从左侧导航到“所有服务” > “常规” > “资源组” 。From the left, navigate to All services > General > Resource groups.
  3. 选择在 PowerShell 脚本中创建的资源组名称。Select the resource group name you created in your PowerShell script. 如果列出了太多资源组,请使用筛选器。Use the filter if you have too many resource groups listed.
  4. 在“概述” 视图中,除非你与其他项目共享资源组,否则将看到列出一个资源。From the Overview view, you see one resource listed unless you share the resource group with other projects. 该资源就是具有你在前面指定的名称的存储帐户。That resource is the storage account with the name you specified earlier. 选择存储帐户名称。Select the storage account name.
  5. 选择“容器” 磁贴。Select the Containers tile.
  6. 选择“adfgetstarted” 容器。Select the adfgetstarted container. 此时会显示一个名为 hivescripts 的文件夹。You see a folder called hivescripts.
  7. 打开该文件夹,确保其中包含示例脚本文件 partitionweblogs.hqlOpen the folder and make sure it contains the sample script file, partitionweblogs.hql.

了解 Azure 数据工厂活动Understand the Azure Data Factory activity

Azure 数据工厂可以协调和自动化数据的移动和转换。Azure Data Factory orchestrates and automates the movement and transformation of data. Azure 数据工厂可以实时创建 HDInsight Hadoop 群集来处理输入数据切片,并在处理完成后删除群集。Azure Data Factory can create an HDInsight Hadoop cluster just-in-time to process an input data slice and delete the cluster when the processing is complete.

在 Azure 数据工厂中,一个数据工厂可以有一个或多个数据管道。In Azure Data Factory, a data factory can have one or more data pipelines. 一个数据管道有一个或多个活动。A data pipeline has one or more activities. 有两种类型的活动:There are two types of activities:

  • 数据移动活动 - 使用数据移动活动可将数据从源数据存储移到目标数据存储。Data Movement Activities - You use data movement activities to move data from a source data store to a destination data store.
  • 数据转换活动Data Transformation Activities. 可以使用数据转换活动来转换/处理数据。You use data transformation activities to transform/process data. HDInsight Hive 活动是数据工厂支持的转换活动之一。HDInsight Hive Activity is one of the transformation activities supported by Data Factory. 用户在本教程中使用 Hive 转换活动。You use the Hive transformation activity in this tutorial.

在本文中,我们将配置 Hive 活动,以创建按需 HDInsight Hadoop 群集。In this article, you configure the Hive activity to create an on-demand HDInsight Hadoop cluster. 运行活动来处理数据时,会发生以下情况:When the activity runs to process data, here is what happens:

  1. 自动为用户实时创建 HDInsight Hadoop 群集,以便处理该切片。An HDInsight Hadoop cluster is automatically created for you just-in-time to process the slice.

  2. 通过在群集上运行 HiveQL 脚本处理输入数据。The input data is processed by running a HiveQL script on the cluster. 在本教程中,与 Hive 活动关联的 HiveQL 脚本执行以下操作:In this tutorial, the HiveQL script associated with the hive activity performs the following actions:

    • 使用现有表 (hivesampletable) 创建另一个表 HiveSampleOutUses the existing table (hivesampletable) to create another table HiveSampleOut.
    • 只在 HiveSampleOut 表中填充原始 hivesampletable 中的特定列。Populates the HiveSampleOut table with only specific columns from the original hivesampletable.
  3. HDInsight Hadoop 群集在处理完成后删除,群集空闲时间为配置的时间(timeToLive 设置)。The HDInsight Hadoop cluster is deleted after the processing is complete and the cluster is idle for the configured amount of time (timeToLive setting). 如果在这段 timeToLive 空闲时间内可以处理下一数据切片,则会使用同一群集处理该切片。If the next data slice is available for processing with in this timeToLive idle time, the same cluster is used to process the slice.

创建数据工厂Create a data factory

  1. 登录到 Azure 门户Sign in to the Azure portal.

  2. 在左侧菜单中,导航到“+ 创建资源” > “Analytics” > “数据工厂” 。From the left menu, navigate to + Create a resource > Analytics > Data Factory.

    门户中的 Azure 数据工厂Azure Data Factory on the portal

  3. 为“新建数据工厂” 磁贴输入或选择以下值:Enter or select the following values for the New data factory tile:

    属性Property ValueValue
    名称Name 输入数据工厂的名称。Enter a name for the data factory. 此名称必须全局唯一。This name must be globally unique.
    版本Version 保留为“V2” 。Leave at V2.
    订阅Subscription 选择 Azure 订阅。Select your Azure subscription.
    资源组Resource group 选择你使用 PowerShell 脚本创建的资源组。Select the resource group you created using the PowerShell script.
    位置Location 位置会自动设置为先前在创建资源组时指定的位置。The location is automatically set to the location you specified while creating the resource group earlier. 在本教程中,位置设置为“美国东部” 。For this tutorial, the location is set to East US.
    启用 GitEnable GIT 取消选中此框。Uncheck this box.

    使用 Azure 门户创建 Azure 数据工厂Create Azure Data Factory using Azure portal

  4. 选择“创建” 。Select Create. 创建一个数据工厂可能需要花费 2 到 4 分钟。Creating a data factory might take anywhere between 2 to 4 minutes.

  5. 创建数据工厂后,将收到“部署成功”通知,其中包含“转到资源”按钮 。Once the data factory is created, you'll receive a Deployment succeeded notification with a Go to resource button. 选择“转到资源”以打开数据工厂默认视图 。Select Go to resource to open the Data Factory default view.

  6. 选择“创作和监视”启动 Azure 数据工厂创作和监视门户。 Select Author & Monitor to launch the Azure Data Factory authoring and monitoring portal.

    Azure 数据工厂门户概述Azure Data Factory portal overview

创建链接服务Create linked services

在本部分,我们将在数据工厂中创作两个链接服务。In this section, you author two linked services within your data factory.

  • 一个用于将 Azure 存储帐户链接到数据工厂的 Azure 存储链接服务An Azure Storage linked service that links an Azure storage account to the data factory. 按需 HDInsight 群集使用此存储。This storage is used by the on-demand HDInsight cluster. 它还包含群集上运行的 Hive 脚本。It also contains the Hive script that is run on the cluster.
  • 一个按需 HDInsight 链接服务An on-demand HDInsight linked service. Azure 数据工厂自动创建 HDInsight 群集并运行 Hive 脚本。Azure Data Factory automatically creates an HDInsight cluster and runs the Hive script. 然后,当群集空闲预配置的时间后,就会删除 HDInsight 群集。It then deletes the HDInsight cluster after the cluster is idle for a preconfigured time.

创建 Azure 存储链接服务Create an Azure Storage linked service

  1. 在“入门”页的左侧窗格中,选择“创作”图标 。From the left pane of the Let's get started page, select the Author icon.

    创建 Azure 数据工厂链接服务Create an Azure Data Factory linked service

  2. 在窗口左下角选择“连接”,然后选择“+新建”。 Select Connections from the bottom-left corner of the window and then select +New.

    在 Azure 数据工厂中创建连接Create connections in Azure Data Factory

  3. 在“新建链接服务”对话框中选择“Azure Blob 存储”,然后选择“继续”。 In the New Linked Service dialog box, select Azure Blob Storage and then select Continue.

    为数据工厂创建 Azure 存储链接服务Create Azure Storage linked service for Data Factory

  4. 为存储链接服务提供以下值:Provide the following values for the storage linked service:

    属性Property ValueValue
    名称Name 输入 HDIStorageLinkedServiceEnter HDIStorageLinkedService.
    Azure 订阅Azure subscription 从下拉列表中选择你的订阅。Select your subscription from the drop-down list.
    存储帐户名称Storage account name 选择作为 PowerShell 脚本的一部分创建的 Azure 存储帐户。Select the Azure Storage account you created as part of the PowerShell script.

    选择“测试连接” ,如果成功,则选择“创建” 。Select Test connection and if successful, then select Create.

    提供 Azure 存储链接服务的名称Provide name for Azure Storage linked service

创建按需 HDInsight 链接服务Create an on-demand HDInsight linked service

  1. 再次选择“+ 新建”按钮,创建另一个链接服务。 Select the + New button again to create another linked service.

  2. 在“新建链接服务”窗口中,选择“计算”选项卡 。In the New Linked Service window, select the Compute tab.

  3. 选择“Azure HDInsight” ,然后选择“继续” 。Select Azure HDInsight, and then select Continue.

    为 Azure 数据工厂创建 HDInsight 链接服务Create HDInsight linked service for Azure Data Factory

  4. 在“新建链接服务”窗口中,输入以下值,并将其余项保留为默认值 :In the New Linked Service window, enter the following values and leave the rest as default:

    属性Property ValueValue
    名称Name 输入 HDInsightLinkedServiceEnter HDInsightLinkedService.
    类型Type 选择“按需 HDInsight” 。Select On-demand HDInsight.
    Azure 存储链接服务Azure Storage Linked Service 选择 HDIStorageLinkedServiceSelect HDIStorageLinkedService.
    群集类型Cluster type 选择“hadoop” Select hadoop
    生存时间Time to live 提供在自动删除 HDInsight 群集之前希望该群集保留的持续时间。Provide the duration for which you want the HDInsight cluster to be available before being automatically deleted.
    服务主体 IDService principal ID 提供作为先决条件的一部分创建的 Azure Active Directory 服务主体的应用程序 ID。Provide the application ID of the Azure Active Directory service principal you created as part of the prerequisites.
    服务主体密钥Service principal key 提供 Azure Active Directory 服务主体的身份验证密钥。Provide the authentication key for the Azure Active Directory service principal.
    群集名称前缀Cluster name prefix 提供一个值,该值将用作由数据工厂创建的所有群集类型的前缀。Provide a value that will be prefixed to all the cluster types that are created by the data factory.
    订阅Subscription 从下拉列表中选择你的订阅。Select your subscription from the drop-down list.
    选择资源组Select resource group 选择作为先前使用的 PowerShell 脚本的一部分创建的资源组。Select the resource group you created as part of the PowerShell script you used earlier.
    OS 类型/群集 SSH 用户名OS type/Cluster SSH user name 输入一个 SSH 用户名,常见的是 sshuserEnter an SSH user name, commonly sshuser.
    OS 类型/群集 SSH 密码OS type/Cluster SSH password 提供 SSH 用户的密码Provide a password for the SSH user
    OS 类型/群集用户名OS type/Cluster user name 输入一个群集用户名,常见的是 adminEnter a cluster user name, commonly admin.
    OS 类型/群集密码OS type/Cluster password 为群集用户提供密码。Provide a password for the cluster user.

    然后选择“创建” 。Then select Create.

    提供 HDInsight 链接服务的值Provide values for HDInsight linked service

创建管道Create a pipeline

  1. 选择“+ (加)”按钮,然后选择“管道”。 Select the + (plus) button, and then select Pipeline.

    在 Azure 数据工厂中创建管道Create a pipeline in Azure Data Factory

  2. 在“活动”工具箱中展开“HDInsight”,将“Hive”活动拖放到管道设计器图面。In the Activities toolbox, expand HDInsight, and drag the Hive activity to the pipeline designer surface. 在“常规”选项卡中,提供活动的名称。 In the General tab, provide a name for the activity.

    将活动添加到数据工厂管道Add activities to Data Factory pipeline

  3. 确保已选择“Hive”活动,选择“HDI 群集” 选项卡,然后从“HDInsight 链接服务” 下拉列表中,选择先前为 HDInsight 创建的链接服务 HDInsightLinkedServiceMake sure you have the Hive activity selected, select the HDI Cluster tab, and from the HDInsight Linked Service drop-down list, select the linked service you created earlier, HDInsightLinkedService, for HDInsight.

    为管道提供 HDInsight 群集详细信息Provide HDInsight cluster details for the pipeline

  4. 选择“脚本”选项卡并完成以下步骤: Select the Script tab and complete the following steps:

    1. 从下拉列表中为“脚本链接服务”选择“HDIStorageLinkedService” 。For Script Linked Service, select HDIStorageLinkedService from the drop-down list. 此值为先面创建的存储链接服务。This value is the storage linked service you created earlier.

    2. 对于“文件路径”,请选择“浏览存储”并导航到示例 Hive 脚本所在的位置。 For File Path, select Browse Storage and navigate to the location where the sample Hive script is available. 如果先前运行了 PowerShell 脚本,则此位置应是 adfgetstarted/hivescripts/partitionweblogs.hqlIf you ran the PowerShell script earlier, this location should be adfgetstarted/hivescripts/partitionweblogs.hql.

      为管道提供 Hive 脚本详细信息Provide Hive script details for the pipeline

    3. 在“高级” > “参数”下,选择“从脚本自动填充”。 Under Advanced > Parameters, select Auto-fill from script. 此选项会在 Hive 脚本中查找需要在运行时提供值的所有参数。This option looks for any parameters in the Hive script that require values at runtime.

    4. 在“值” 文本框中,以 wasbs://adfgetstarted@<StorageAccount>.blob.core.windows.net/outputfolder/ 格式添加现有文件夹。In the value text box, add the existing folder in the format wasbs://adfgetstarted@<StorageAccount>.blob.core.windows.net/outputfolder/. 此路径区分大小写。The path is case-sensitive. 这是脚本的输出的存储路径。This is the path where the output of the script will be stored. wasbs 架构是必需的,因为存储帐户现在默认情况下已启用“需要安全传输”。The wasbs schema is necessary because storage accounts now have secure transfer required enabled by default.

      为 Hive 脚本提供参数Provide parameters for the Hive script

  5. 选择“验证”以验证管道。 Select Validate to validate the pipeline. 选择 >> (右键头)按钮,关闭验证窗口。Select the >> (right arrow) button to close the validation window.

    验证 Azure 数据工厂管道Validate the Azure Data Factory pipeline

  6. 最后,选择“全部发布”,将项目发布到 Azure 数据工厂。 Finally, select Publish All to publish the artifacts to Azure Data Factory.

    发布 Azure 数据工厂管道Publish the Azure Data Factory pipeline

触发管道Trigger a pipeline

  1. 在设计器图面上的工具栏中,依次选择“添加触发器” > “立即触发” 。From the toolbar on the designer surface, select Add trigger > Trigger Now.

    触发 Azure 数据工厂管道Trigger the Azure Data Factory pipeline

  2. 在弹出侧栏中选择“确定” 。Select OK in the pop-up side bar.

监视管道Monitor a pipeline

  1. 在左侧切换到“监视”选项卡。 Switch to the Monitor tab on the left. “管道运行”列表中会显示一个管道运行。 You see a pipeline run in the Pipeline Runs list. 注意“状态”列下面的运行状态。 Notice the status of the run under the Status column.

    监视 Azure 数据工厂管道Monitor the Azure Data Factory pipeline

  2. 选择“刷新”可刷新状态。 Select Refresh to refresh the status.

  3. 还可以选择“查看活动运行”图标来查看与管道关联的活动运行。 You can also select the View Activity Runs icon to see the activity run associated with the pipeline. 以下屏幕截图中只显示了一个活动运行,因为创建的管道中只包含一个活动。In the screenshot below, you see only one activity run since there's only one activity in the pipeline you created. 若要切回到上一视图,请选择靠近页面顶部的“管道”。 To switch back to the previous view, select Pipelines towards the top of the page.

    监视 Azure 数据工厂管道活动Monitor the Azure Data Factory pipeline activity

验证输出Verify the output

  1. 若要验证输出,请在 Azure 门户中导航到本教程使用的存储帐户。To verify the output, in the Azure portal navigate to the storage account that you used for this tutorial. 应会看到以下文件夹或容器:You should see the following folders or containers:

    • 一个 adfgerstarted/outputfolder,其中包含作为管道一部分运行的 Hive 脚本的输出。You see an adfgerstarted/outputfolder that contains the output of the Hive script that was run as part of the pipeline.

    • 一个 adfhdidatafactory-<链接服务名称>-<时间戳> 容器。You see an adfhdidatafactory-<linked-service-name>-<timestamp> container. 此容器是管道运行过程中创建的 HDInsight 群集的默认存储位置。This container is the default storage location of the HDInsight cluster that was created as part of the pipeline run.

    • 一个 adfjobs 容器,其中包含 Azure 数据工厂作业日志。You see an adfjobs container that has the Azure Data Factory job logs.

      验证 Azure 数据工厂管道输出Verify the Azure Data Factory pipeline output

清理资源Clean up resources

如果创建了按需 HDInsight 群集,你不需要显式删除 HDInsight 群集。With the on-demand HDInsight cluster creation, you don't need to explicitly delete the HDInsight cluster. 系统会根据创建管道时提供的配置来删除群集。The cluster is deleted based on the configuration you provided while creating the pipeline. 但是,即使在删除群集后,与该群集关联的存储帐户仍会继续存在。However, even after the cluster is deleted, the storage accounts associated with the cluster continue to exist. 此行为是设计使然,以便可以保留数据。This behavior is by design so that you can keep your data intact. 但是,如果你不希望持久保留数据,可以删除创建的存储帐户。However, if you don't want to persist the data, you may delete the storage account you created.

或者,可以删除在本教程中创建的整个资源组。Alternatively, you can delete the entire resource group that you created for this tutorial. 这会删除创建的存储帐户和 Azure 数据工厂。This deletes the storage account and the Azure Data Factory that you created.

删除资源组Delete the resource group

  1. 登录到 Azure 门户Sign on to the Azure portal.

  2. 在左窗格中选择“资源组” 。Select Resource groups on the left pane.

  3. 选择在 PowerShell 脚本中创建的资源组名称。Select the resource group name you created in your PowerShell script. 如果列出了太多资源组,请使用筛选器。Use the filter if you have too many resource groups listed. 这会打开资源组。It opens the resource group.

  4. 除非你与其他项目共享资源组,否则在“资源”磁贴中,应列出默认存储帐户和数据工厂 。On the Resources tile, you shall have the default storage account and the data factory listed unless you share the resource group with other projects.

  5. 选择“删除资源组” 。Select Delete resource group. 这样做会删除存储帐户和存储帐户中存储的数据。Doing so deletes the storage account and the data stored in the storage account.

    Azure 门户删除资源组Azure portal delete resource group

  6. 输入资源组名称以确认删除,然后选择“删除” 。Enter the resource group name to confirm deletion, and then select Delete.

后续步骤Next steps

本文已介绍如何使用 Azure 数据工厂创建按需 HDInsight 群集和运行 Apache Hive 作业。In this article, you learned how to use Azure Data Factory to create on-demand HDInsight cluster and run Apache Hive jobs. 请继续学习下一篇文章,了解如何使用自定义配置创建 HDInsight 群集。Advance to the next article to learn how to create HDInsight clusters with custom configuration.