快速入门:使用 ARM 模板创建 Azure 数据工厂Quickstart: Create an Azure Data Factory using ARM template

适用于:是 Azure 数据工厂否 Azure Synapse Analytics(预览版)APPLIES TO: yesAzure Data Factory noAzure Synapse Analytics (Preview)

本快速入门介绍如何使用 Azure 资源管理器模板(ARM 模板)来创建 Azure 数据工厂。This quickstart describes how to use an Azure Resource Manager template (ARM template) to create an Azure data factory. 在此数据工厂中创建的管道会将数据从 Azure Blob 存储中的一个文件夹复制到另一个文件夹。The pipeline you create in this data factory copies data from one folder to another folder in an Azure blob storage. 有关如何使用 Azure 数据工厂转换数据的教程,请参阅教程:使用 Spark 转换数据For a tutorial on how to transform data using Azure Data Factory, see Tutorial: Transform data using Spark.

ARM 模板是定义项目基础结构和配置的 JavaScript 对象表示法 (JSON) 文件。An ARM template is a JavaScript Object Notation (JSON) file that defines the infrastructure and configuration for your project. 该模板使用声明性语法,使你可以声明要部署的内容,而不需要编写一系列编程命令来进行创建。The template uses declarative syntax, which lets you state what you intend to deploy without having to write the sequence of programming commands to create it.

备注

本文不提供数据工厂服务的详细介绍。This article does not provide a detailed introduction of the Data Factory service. 有关 Azure 数据工厂服务的介绍,请参阅 Azure 数据工厂简介For an introduction to the Azure Data Factory service, see Introduction to Azure Data Factory.

如果你的环境满足先决条件,并且你熟悉如何使用 ARM 模板,请选择“部署到 Azure”按钮。If your environment meets the prerequisites and you're familiar with using ARM templates, select the Deploy to Azure button. Azure 门户中会打开模板。The template will open in the Azure portal.

“部署到 Azure”Deploy to Azure

先决条件Prerequisites

Azure 订阅Azure subscription

如果没有 Azure 订阅,可在开始前创建一个试用帐户If you don't have an Azure subscription, create a trial account before you begin.

创建文件Create a file

打开文本编辑器(如记事本),并创建包含以下内容的名为“emp.txt”的文件:Open a text editor such as Notepad, and create a file named emp.txt with the following content:

John, Doe
Jane, Doe

将此文件保存在 C:\ADFv2QuickStartPSH 文件夹中。Save the file in the C:\ADFv2QuickStartPSH folder. (如果此文件夹不存在,则创建它。)(If the folder doesn't already exist, create it.)

审阅模板Review template

本快速入门中使用的模板来自 Azure 快速启动模板The template used in this quickstart is from Azure Quickstart Templates.

{
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "dataFactoryName": {
            "type": "string",
            "defaultValue": "[concat('datafactory', uniqueString(resourceGroup().id))]",
            "metadata": {
                "description": "Data Factory Name"
            }
        },
        "location": {
            "type": "string",
            "defaultValue": "[resourceGroup().location]",
            "metadata": {
                "description": "Location of the data factory."
            }
        },
        "storageAccountName": {
            "type": "string",
            "defaultValue": "[concat('storage', uniqueString(resourceGroup().id))]",
            "metadata": {
                "description": "Name of the Azure storage account that contains the input/output data."
            }
        },
        "blobContainer": {
            "type": "string",
            "defaultValue": "[concat('blob', uniqueString(resourceGroup().id))]",
            "metadata": {
                "description": "Name of the blob container in the Azure Storage account."
            }
        }
    },
    "variables": {
        "storageAccountId": "[resourceId('Microsoft.Storage/storageAccounts', parameters('storageAccountName'))]",
        "storageLinkedService": "[resourceId('Microsoft.DataFactory/factories/linkedServices', parameters('dataFactoryName'), 'armTemplateStorageLinkedService')]",
        "datasetIn": "[resourceId('Microsoft.DataFactory/factories/datasets', parameters('dataFactoryName'), 'armTemplateTestDatasetIn')]",
        "datasetOut": "[resourceId('Microsoft.DataFactory/factories/datasets', parameters('dataFactoryName'), 'armTemplateTestDatasetOut')]"
    },
    "resources": [
        {
            "type": "Microsoft.Storage/storageAccounts",
            "apiVersion": "2019-06-01",
            "name": "[parameters('storageAccountName')]",
            "location": "[parameters('location')]",
            "sku": {
                "name": "Standard_LRS"
            },
            "kind": "StorageV2",
            "properties": {},
            "resources": [
                {
                    "type": "blobServices/containers",
                    "apiVersion": "2019-06-01",
                    "name": "[concat('default/', parameters('blobContainer'))]",
                    "dependsOn": [
                        "[parameters('storageAccountName')]"
                    ]
                }
            ]
        },
        {
            "type": "Microsoft.DataFactory/factories",
            "name": "[parameters('dataFactoryName')]",
            "apiVersion": "2018-06-01",
            "location": "[parameters('location')]",
            "properties": {},
            "identity": {
                "type": "SystemAssigned"
            },
            "resources": [
                {
                    "type": "Microsoft.DataFactory/factories/linkedServices",
                    "name": "[concat(parameters('dataFactoryName'), '/ArmtemplateStorageLinkedService')]",
                    "apiVersion": "2018-06-01",
                    "location": "[parameters('location')]",
                    "dependsOn": [
                        "[parameters('dataFactoryName')]",
                        "[parameters('storageAccountName')]"
                    ],
                    "properties": {
                        "type": "AzureBlobStorage",
                        "typeProperties": {
                            "connectionString": "[concat('DefaultEndpointsProtocol=https;EndpointSuffix=core.chinacloudapi.cn;AccountName=',parameters('storageAccountName'),';AccountKey=',listKeys(variables('storageAccountId'), '2019-06-01').keys[0].value)]"
                        }
                    }
                },
                {
                    "type": "Microsoft.DataFactory/factories/datasets",
                    "name": "[concat(parameters('dataFactoryName'), '/ArmtemplateTestDatasetIn')]",
                    "apiVersion": "2018-06-01",
                    "location": "[parameters('location')]",
                    "dependsOn": [
                        "[parameters('dataFactoryName')]",
                        "[variables('storageLinkedService')]"
                    ],
                    "properties": {
                        "linkedServiceName": {
                            "referenceName": "ArmtemplateStorageLinkedService",
                            "type": "LinkedServiceReference"
                        },
                        "type": "Binary",
                        "typeProperties": {
                            "location": {
                                "type": "AzureBlobStorageLocation",
                                "container": "[parameters('blobContainer')]",
                                "folderPath": "input",
                                "fileName": "emp.txt"
                            }
                        }
                    }
                },
                {
                    "type": "Microsoft.DataFactory/factories/datasets",
                    "name": "[concat(parameters('dataFactoryName'), '/ArmtemplateTestDatasetOut')]",
                    "apiVersion": "2018-06-01",
                    "location": "[parameters('location')]",
                    "dependsOn": [
                        "[parameters('dataFactoryName')]",
                        "[variables('storageLinkedService')]"
                    ],
                    "properties": {
                        "linkedServiceName": {
                            "referenceName": "ArmtemplateStorageLinkedService",
                            "type": "LinkedServiceReference"
                        },
                        "type": "Binary",
                        "typeProperties": {
                            "location": {
                                "type": "AzureBlobStorageLocation",
                                "container": "[parameters('blobContainer')]",
                                "folderPath": "output"
                            }
                        }
                    }
                },
                {
                    "type": "Microsoft.DataFactory/factories/pipelines",
                    "name": "[concat(parameters('dataFactoryName'), '/ArmtemplateSampleCopyPipeline')]",
                    "apiVersion": "2018-06-01",
                    "location": "[parameters('location')]",
                    "dependsOn": [
                        "[parameters('dataFactoryName')]",
                        "[variables('datasetIn')]",
                        "[variables('datasetOut')]"
                    ],
                    "properties": {
                        "activities": [
                            {
                            "name": "MyCopyActivity",
                            "type": "Copy",
                            "policy": {
                                "timeout": "7.00:00:00",
                                "retry": 0,
                                "retryIntervalInSeconds": 30,
                                "secureOutput": false,
                                "secureInput": false
                            },
                            "typeProperties": {
                                "source": {
                                    "type": "BinarySource",
                                    "storeSettings": {
                                        "type": "AzureBlobStorageReadSettings",
                                        "recursive": true
                                    }
                                },
                                "sink": {
                                    "type": "BinarySink",
                                    "storeSettings": {
                                        "type": "AzureBlobStorageWriteSettings"
                                    }
                                },
                                "enableStaging": false
                            },
                            "inputs": [{
                                "referenceName": "ArmtemplateTestDatasetIn",
                                "type": "DatasetReference",
                                "parameters": {

                                    }
                                }
                            ],
                            "outputs": [{
                                "referenceName": "ArmtemplateTestDatasetOut",
                                "type": "DatasetReference",
                                "parameters": {}
                                    }
                                ]
                            }
                        ]
                    }
                }
            ]
        }
    ]
}

该模板中定义了 Azure 资源:There are Azure resources defined in the template:

可以在快速入门模板库中找到更多 Azure 数据工厂模板示例。More Azure Data Factory template samples can be found in the quickstart template gallery.

部署模板Deploy the template

  1. 选择下图登录到 Azure 并打开一个模板。Select the following image to sign in to Azure and open a template. 此模板创建 Azure 数据工厂帐户、存储帐户和 blob 容器。The template creates an Azure Data Factory account, a storage account, and a blob container.

    部署到 AzureDeploy to Azure

  2. 选择或输入以下值。Select or enter the following values.

    部署 ADF ARM 模板

    除非另有指定,否则请使用默认值创建 Azure 数据工厂资源:Unless it's specified, use the default values to create the Azure Data Factory resources:

    • 订阅:选择 Azure 订阅。Subscription: Select an Azure subscription.
    • 资源组:选择“新建”,输入资源组的唯一名称,然后选择“确定”。Resource group: Select Create new, enter a unique name for the resource group, and then select OK.
    • 区域:选择一个位置。Region: Select a location. 例如“美国东部”。For example, East US.
    • 数据工厂名称:使用默认值。Data Factory Name: Use default value.
    • 位置:使用默认值。Location: Use default value.
    • 存储帐户名称:使用默认值。Storage Account Name: Use default value.
    • Blob 容器:使用默认值。Blob Container: Use default value.

查看已部署的资源Review deployed resources

  1. 选择“转到资源组”。Select Go to resource group.

    资源组

  2. 验证是否已创建 Azure 数据工厂。Verify your Azure Data Factory is created.

    1. Azure 数据工厂名称的格式为 - datafactory<uniqueid>。Your Azure Data Factory name is in the format - datafactory<uniqueid>.

    数据工厂示例

  3. 验证是否已创建存储帐户。Verify your storage account is created.

    1. 存储帐户名称的格式为 - storage<uniqueid>。The storage account name is in the format - storage<uniqueid>.

    存储帐户

  4. 选择创建的存储帐户,并选择“容器”。Select the storage account created and then select Containers.

    1. 在“容器”页上,选择创建的 blob 容器。On the Containers page, select the blob container you created.
      1. blob 容器名称的格式为 - blob<uniqueid>。The blob container name is in the format - blob<uniqueid>.

    blob 容器

上传文件Upload a file

  1. 在“容器”页上,选择“上传”。On the Containers page, select Upload.

  2. 在右侧窗口,选择“文件”框,然后浏览到先前创建的“emp.txt”文件并进行选择。In te right pane, select the Files box, and then browse to and select the emp.txt file that you created earlier.

  3. 展开“高级”标题。Expand the Advanced heading.

  4. 在“上传到文件夹”框中,输入“输入”。In the Upload to folder box, enter input.

  5. 选择“上传”按钮。Select the Upload button. 应该会在列表中看到 emp.txt 文件和上传状态。You should see the emp.txt file and the status of the upload in the list.

  6. 选择“关闭”图标 (X) 以关闭“上传 Blob”页面 。Select the Close icon (an X) to close the Upload blob page.

    将文件上传到输入文件夹

使容器页保持打开状态,因为你可以使用它在快速入门结束时验证输出。Keep the container page open, because you can use it to verify the output at the end of this quickstart.

启动触发器Start Trigger

  1. 导航到“数据工厂”页,选择创建的数据工厂。Navigate to the Data factories page, and select the data factory you created.

  2. 选择“创建者和监视器”磁贴。Select the Author & Monitor tile.

    创建者和监视器

  3. 选择“创作”选项卡

  4. 选择创建的管道 - ArmtemplateSampleCopyPipeline。Select the pipeline created - ArmtemplateSampleCopyPipeline.

    ARM 模板管道

  5. 选择“添加触发器” > “立即触发器”。Select Add Trigger > Trigger Now.

    触发器

  6. 在“管道运行”下的右窗格中,选择“确定”。In the right pane under Pipeline run, select OK.

监视管道Monitor the pipeline

  1. 选择“监视”选项卡

  2. 此时会看到与管道运行相关联的活动运行。You see the activity runs associated with the pipeline run. 在本快速入门中,管道只有一个活动,其类型为:“复制”。In this quickstart, the pipeline has only one activity of type: Copy. 因此会看到该活动运行。As such, you see a run for that activity.

    成功运行

验证输出文件Verify the output file

该管道自动在 blob 容器中创建一个输出文件夹。The pipeline automatically creates an output folder in the blob container. 然后将 emp.txt 文件从 input 文件夹复制到 output 文件夹。Then, it copies the emp.txt file from the input folder to the output folder.

  1. 在 Azure 门户的“容器”页中选择“刷新”,以查看输出文件夹。In the Azure portal, on the Containers page, select Refresh to see the output folder.

  2. 在文件夹列表中,选择“output”。Select output in the folder list.

  3. 确认 emp.txt 已复制到 output 文件夹。Confirm that the emp.txt is copied to the output folder.

    输出

清理资源Clean up resources

可以通过两种方式清理在快速入门中创建的资源。You can clean up the resources that you created in the Quickstart in two ways. 可以删除 Azure 资源组,其中包括资源组中的所有资源。You can delete the Azure resource group, which includes all the resources in the resource group. 若要使其他资源保持原封不动,请仅删除在此教程中创建的数据工厂。If you want to keep the other resources intact, delete only the data factory you created in this tutorial.

删除资源组时会删除所有资源,包括其中的数据工厂。Deleting a resource group deletes all resources including data factories in it. 运行以下命令可以删除整个资源组:Run the following command to delete the entire resource group:

Remove-AzResourceGroup -ResourceGroupName $resourcegroupname

如果只需删除数据工厂,而不是整个资源组,请运行以下命令:If you want to delete just the data factory, and not the entire resource group, run the following command:

Remove-AzDataFactoryV2 -Name $dataFactoryName -ResourceGroupName $resourceGroupName

后续步骤Next steps

在本快速入门中,你使用 ARM 模板创建了 Azure 数据工厂工作区,并验证了部署。In this quickstart, you created an Azure Data Factory using an ARM template and validated the deployment. 若要详细了解 Azure 数据工厂和 Azure 资源管理器,请继续阅读以下文章。To learn more about Azure Data Factory and Azure Resource Manager, continue on to the articles below.