教程:使用 Azure 资源管理器模板创建 Azure 数据工厂Tutorial: Create an Azure data factory using Azure Resource Manager template

本快速入门介绍如何使用 Azure 资源管理器模板来创建 Azure 数据工厂。This quickstart describes how to use an Azure Resource Manager template to create an Azure data factory. 在此数据工厂中创建的管道会将数据从 Azure Blob 存储中的一个文件夹复制到另一个文件夹。The pipeline you create in this data factory copies data from one folder to another folder in an Azure blob storage. 有关如何使用 Azure 数据工厂转换数据的教程,请参阅教程:使用 Spark 转换数据For a tutorial on how to transform data using Azure Data Factory, see Tutorial: Transform data using Spark.

Note

本文不提供数据工厂服务的详细介绍。This article does not provide a detailed introduction of the Data Factory service. 有关 Azure 数据工厂服务的介绍,请参阅 Azure 数据工厂简介For an introduction to the Azure Data Factory service, see Introduction to Azure Data Factory.

先决条件Prerequisites

Azure 订阅Azure subscription

如果没有 Azure 订阅,可在开始前创建一个 1 元人民币试用帐户If you don't have an Azure subscription, create a 1rmb trial account before you begin.

Azure 角色Azure roles

若要创建数据工厂实例,用于登录到 Azure 的用户帐户必须属于参与者或所有者角色,或者是 Azure 订阅的管理员。 To create Data Factory instances, the user account that you use to sign in to Azure must be a member of the contributor or owner role, or an administrator of the Azure subscription. 若要查看你在订阅中拥有的权限,请转到 Azure 门户,选择右上角的用户名,然后选择“更多选项(...)”,再选择“我的权限” 。To view the permissions that you have in the subscription, go to the Azure portal, select your username in the upper-right corner, select More options (...), and then select My permissions. 如果可以访问多个订阅,请选择相应的订阅。If you have access to multiple subscriptions, select the appropriate subscription.

若要为数据工厂创建和管理子资源(包括数据集、链接服务、管道、触发器和集成运行时),以下要求适用:To create and manage child resources for Data Factory - including datasets, linked services, pipelines, triggers, and integration runtimes - the following requirements are applicable:

  • 若要在 Azure 门户中创建和管理子资源,你必须属于资源组级别或更高级别的数据工厂参与者角色。To create and manage child resources in the Azure portal, you must belong to the Data Factory Contributor role at the resource group level or above.
  • 若要使用 PowerShell 或 SDK 创建和管理子资源,资源级别或更高级别的参与者角色已足够。To create and manage child resources with PowerShell or the SDK, the contributor role at the resource level or above is sufficient.

有关如何将用户添加到角色的示例说明,请参阅添加角色一文。For sample instructions about how to add a user to a role, see the Add roles article.

有关详细信息,请参阅以下文章:For more info, see the following articles:

Azure 存储帐户Azure storage account

在本快速入门中,使用通用的 Azure 存储帐户(具体说来就是 Blob 存储)作为源和目标数据存储。 You use a general-purpose Azure storage account (specifically Blob storage) as both source and destination data stores in this quickstart. 如果没有通用的 Azure 存储帐户,请参阅创建存储帐户创建一个。If you don't have a general-purpose Azure storage account, see Create a storage account to create one.

获取存储帐户名称Get the storage account name

在本快速入门中,将需要 Azure 存储帐户的名称。You will need the name of your Azure storage account for this quickstart. 以下过程提供的步骤用于获取存储帐户的名称:The following procedure provides steps to get the name of your storage account:

  1. 在 Web 浏览器中,转到 Azure 门户并使用你的 Azure 用户名和密码登录。In a web browser, go to the Azure portal and sign in using your Azure username and password.
  2. 从 Azure 门户菜单中,选择“所有服务”,然后选择“存储” > “存储帐户” 。From the Azure portal menu, select All services, then select Storage > Storage accounts. 此外,也可以在任何页面中搜索和选择“存储帐户” 。You can also search for and select Storage accounts from any page.
  3. 在“存储帐户”页中,筛选你的存储帐户(如果需要),然后选择它 。In the Storage accounts page, filter for your storage account (if needed), and then select your storage account.

此外,也可以在任何页面中搜索和选择“存储帐户” 。You can also search for and select Storage accounts from any page.

创建 Blob 容器Create a blob container

本部分介绍如何在 Azure Blob 存储中创建名为 adftutorial 的 Blob 容器。In this section, you create a blob container named adftutorial in Azure Blob storage.

  1. 在“存储帐户”页上,选择“概述” > “Blob”。 From the storage account page, select Overview > Blobs.

  2. 在 <Account name> - “Blob”页的工具栏中,选择“容器” 。On the <Account name> - Blobs page's toolbar, select Container.

  3. 在“新建容器” 对话框中,输入 adftutorial 作为名称,然后选择“确定” 。In the New container dialog box, enter adftutorial for the name, and then select OK. <Account name> - “Blob”页已更新为包含容器列表中的“adftutorial” 。The <Account name> - Blobs page is updated to include adftutorial in the list of containers.

    容器列表

为 Blob 容器添加输入文件夹和文件Add an input folder and file for the blob container

在此部分中,在刚创建的容器中创建名为“input”的文件夹,再将示例文件上传到 input 文件夹 。In this section, you create a folder named input in the container you just created, and then upload a sample file to the input folder. 在开始之前,打开文本编辑器(如记事本),并创建包含以下内容的名为“emp.txt”的文件 :Before you begin, open a text editor such as Notepad, and create a file named emp.txt with the following content:

John, Doe
Jane, Doe

将此文件保存在 C:\ADFv2QuickStartPSH 文件夹中 。Save the file in the C:\ADFv2QuickStartPSH folder. (如果此文件夹不存在,则创建它。)然后返回到 Azure 门户并执行以下步骤:(If the folder doesn't already exist, create it.) Then return to the Azure portal and follow these steps:

  1. 在上次离开的 <Account name> - “Blob”页中,选择容器更新列表中的“adftutorial” 。In the <Account name> - Blobs page where you left off, select adftutorial from the updated list of containers.

    1. 如果关闭了窗口或转到其他页,请再次登录到 Azure 门户If you closed the window or went to another page, sign in to the Azure portal again.
    2. 从 Azure 门户菜单中,选择“所有服务”,然后选择“存储” > “存储帐户” 。From the Azure portal menu, select All services, then select Storage > Storage accounts. 此外,也可以在任何页面中搜索和选择“存储帐户” 。You can also search for and select Storage accounts from any page.
    3. 选择存储帐户,然后选择“Blobs” > “adftutorial” 。Select your storage account, and then select Blobs > adftutorial.
  2. 在“adftutorial”容器页面的工具栏上,选择“上传” 。On the adftutorial container page's toolbar, select Upload.

  3. 在“上传 Blob”页中,选择“文件”框,然后浏览到 emp.txt 文件并进行选择 。In the Upload blob page, select the Files box, and then browse to and select the emp.txt file.

  4. 展开“高级”标题 。Expand the Advanced heading. 此页现在显示如下内容:The page now displays as shown:

    选择“高级...”链接

  5. 在“上传到文件夹”框中,输入“输入” 。In the Upload to folder box, enter input.

  6. 选择“上传”按钮 。Select the Upload button. 应该会在列表中看到 emp.txt 文件和上传状态。You should see the emp.txt file and the status of the upload in the list.

  7. 选择“关闭”图标 (X) 以关闭“上传 Blob”页面 。Select the Close icon (an X) to close the Upload blob page.

让“adftutorial”容器页面保持打开状态 。Keep the adftutorial container page open. 在本快速入门结束时可以使用它来验证输出。You use it to verify the output at the end of this quickstart.

Azure PowerShellAzure PowerShell

Note

本文进行了更新,以便使用新的 Azure PowerShell Az 模块。This article has been updated to use the new Azure PowerShell Az module. 你仍然可以使用 AzureRM 模块,至少在 2020 年 12 月之前,它将继续接收 bug 修补程序。You can still use the AzureRM module, which will continue to receive bug fixes until at least December 2020. 若要详细了解新的 Az 模块和 AzureRM 兼容性,请参阅新 Azure Powershell Az 模块简介To learn more about the new Az module and AzureRM compatibility, see Introducing the new Azure PowerShell Az module. 有关 Az 模块安装说明,请参阅安装 Azure PowerShellFor Az module installation instructions, see Install Azure PowerShell.

如何安装和配置 Azure PowerShell 中的说明安装最新的 Azure PowerShell 模块。Install the latest Azure PowerShell modules by following instructions in How to install and configure Azure PowerShell.

Resource Manager 模板Resource Manager templates

若要了解 Azure 资源管理器模板的一般信息,请参阅创作 Azure 资源管理器模板To learn about Azure Resource Manager templates in general, see Authoring Azure Resource Manager Templates.

以下部分提供了用于定义数据工厂实体的完整 Resource Manager 模板,以便可以快速完成整个教程并测试模板。The following section provides the complete Resource Manager template for defining Data Factory entities so that you can quickly run through the tutorial and test the template. 若要了解每个数据工厂实体的定义方式,请参阅模板中的数据工厂实体部分。To understand how each Data Factory entity is defined, see Data Factory entities in the template section.

若要了解模板中数据工厂资源的 JSON 语法和属性,请参阅 Microsoft.DataFactory 资源类型To learn about the JSON syntax and properties for Data Factory resources in a template, see Microsoft.DataFactory resource types.

数据工厂 JSONData Factory JSON

C:\ADFTutorial 文件夹中创建名为 ADFTutorialARM.json 的 JSON 文件(如果 ADFTutorial 文件夹不存在,请创建该文件夹),其内容如下:Create a JSON file named ADFTutorialARM.json in C:\ADFTutorial folder (Create the ADFTutorial folder if it doesn't already exist) with the following content:

{  
    "$schema":"http://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
    "contentVersion":"1.0.0.0",
    "parameters":{  
        "dataFactoryName":{  
            "type":"string",
            "metadata":"Data Factory Name"
        },
        "dataFactoryLocation":{  
            "type":"string",
            "defaultValue":"China East 2",
            "metadata":{  
                "description":"Location of the data factory. Currently, China East 2 is supported. "
            }
        },
        "storageAccountName":{  
            "type":"string",
            "metadata":{  
                "description":"Name of the Azure storage account that contains the input/output data."
            }
        },
        "storageAccountKey":{  
            "type":"securestring",
            "metadata":{  
                "description":"Key for the Azure storage account."
            }
        },
        "triggerStartTime": {
            "type": "string",
            "metadata": {
                "description": "Start time for the trigger."
            }
        },
        "triggerEndTime": {
            "type": "string",
            "metadata": {
                "description": "End time for the trigger."
            }
        }
    },      
    "variables":{  
        "factoryId":"[concat('Microsoft.DataFactory/factories/', parameters('dataFactoryName'))]"
    },
    "resources":[  
        {  
            "name":"[parameters('dataFactoryName')]",
            "apiVersion":"2018-06-01",
            "type":"Microsoft.DataFactory/factories",
            "location":"[parameters('dataFactoryLocation')]",
            "identity":{  
                "type":"SystemAssigned"
            },
            "resources":[  
                {  
                    "name":"[concat(parameters('dataFactoryName'), '/ArmtemplateStorageLinkedService')]",
                    "type":"Microsoft.DataFactory/factories/linkedServices",
                    "apiVersion":"2018-06-01",
                    "properties":{  
                        "annotations":[  

                        ],
                        "type":"AzureBlobStorage",
                        "typeProperties":{  
                            "connectionString": "[concat('DefaultEndpointsProtocol=https;AccountName=',parameters('storageAccountName'),';AccountKey=',parameters('storageAccountKey'),';EndpointSuffix=core.chinacloudapi.cn')]"
                        }
                    },
                    "dependsOn":[  
                        "[parameters('dataFactoryName')]"
                    ]
                },
                {  
                    "name":"[concat(parameters('dataFactoryName'), '/ArmtemplateTestDatasetIn')]",
                    "type":"Microsoft.DataFactory/factories/datasets",
                    "apiVersion":"2018-06-01",
                    "properties":{  
                        "linkedServiceName":{  
                            "referenceName":"ArmtemplateStorageLinkedService",
                            "type":"LinkedServiceReference"
                        },
                        "annotations":[  

                        ],
                        "type":"Binary",
                        "typeProperties":{  
                            "location":{  
                                "type":"AzureBlobStorageLocation",
                                "fileName":"emp.txt",
                                "folderPath":"input",
                                "container":"adftutorial"
                            }
                        }
                    },
                    "dependsOn":[  
                        "[parameters('dataFactoryName')]",
                        "[concat(variables('factoryId'), '/linkedServices/ArmtemplateStorageLinkedService')]"
                    ]
                },
                {  
                    "name":"[concat(parameters('dataFactoryName'), '/ArmtemplateTestDatasetOut')]",
                    "type":"Microsoft.DataFactory/factories/datasets",
                    "apiVersion":"2018-06-01",
                    "properties":{  
                        "linkedServiceName":{  
                            "referenceName":"ArmtemplateStorageLinkedService",
                            "type":"LinkedServiceReference"
                        },
                        "annotations":[  

                        ],
                        "type":"Binary",
                        "typeProperties":{  
                            "location":{  
                                "type":"AzureBlobStorageLocation",
                                "folderPath":"output",
                                "container":"adftutorial"
                            }
                        }
                    },
                    "dependsOn":[  
                        "[parameters('dataFactoryName')]",
                        "[concat(variables('factoryId'), '/linkedServices/ArmtemplateStorageLinkedService')]"
                    ]
                },
                {  
                    "name":"[concat(parameters('dataFactoryName'), '/ArmtemplateSampleCopyPipeline')]",
                    "type":"Microsoft.DataFactory/factories/pipelines",
                    "apiVersion":"2018-06-01",
                    "properties":{  
                        "activities":[  
                            {  
                                "name":"MyCopyActivity",
                                "type":"Copy",
                                "dependsOn":[  

                                ],
                                "policy":{  
                                    "timeout":"7.00:00:00",
                                    "retry":0,
                                    "retryIntervalInSeconds":30,
                                    "secureOutput":false,
                                    "secureInput":false
                                },
                                "userProperties":[  

                                ],
                                "typeProperties":{  
                                    "source":{  
                                        "type":"BinarySource",
                                        "storeSettings":{  
                                            "type":"AzureBlobStorageReadSettings",
                                            "recursive":true
                                        }
                                    },
                                    "sink":{  
                                        "type":"BinarySink",
                                        "storeSettings":{  
                                            "type":"AzureBlobStorageWriteSettings"
                                        }
                                    },
                                    "enableStaging":false
                                },
                                "inputs":[  
                                    {  
                                        "referenceName":"ArmtemplateTestDatasetIn",
                                        "type":"DatasetReference",
                                        "parameters":{  

                                        }
                                    }
                                ],
                                "outputs":[  
                                    {  
                                        "referenceName":"ArmtemplateTestDatasetOut",
                                        "type":"DatasetReference",
                                        "parameters":{  

                                        }
                                    }
                                ]
                            }
                        ],
                        "annotations":[  

                        ]
                    },
                    "dependsOn":[  
                        "[parameters('dataFactoryName')]",
                        "[concat(variables('factoryId'), '/datasets/ArmtemplateTestDatasetIn')]",
                        "[concat(variables('factoryId'), '/datasets/ArmtemplateTestDatasetOut')]"
                    ]
                },
                {  
                    "name":"[concat(parameters('dataFactoryName'), '/ArmTemplateTestTrigger')]",
                    "type":"Microsoft.DataFactory/factories/triggers",
                    "apiVersion":"2018-06-01",
                    "properties":{  
                        "annotations":[  

                        ],
                        "runtimeState":"Started",
                        "pipelines":[  
                            {  
                                "pipelineReference":{  
                                    "referenceName":"ArmtemplateSampleCopyPipeline",
                                    "type":"PipelineReference"
                                },
                                "parameters":{  

                                }
                            }
                        ],
                        "type":"ScheduleTrigger",
                        "typeProperties":{  
                            "recurrence":{  
                                "frequency":"Hour",
                                "interval":1,
                                "startTime":"[parameters('triggerStartTime')]",
                                "endTime":"[parameters('triggerEndTime')]",
                                "timeZone":"UTC"
                            }
                        }
                    },
                    "dependsOn":[  
                        "[parameters('dataFactoryName')]",
                        "[concat(variables('factoryId'), '/pipelines/ArmtemplateSampleCopyPipeline')]"
                    ]
                }
            ]
        }
    ]
}

参数 JSONParameters JSON

创建名为 ADFTutorialARM-Parameters.json、包含 Azure 资源管理器模板参数的 JSON 文件。Create a JSON file named ADFTutorialARM-Parameters.json that contains parameters for the Azure Resource Manager template.

Important

  • 为此参数文件中的 storageAccountNamestorageAccountKey 参数指定 Azure 存储帐户的名称和密钥。Specify the name and key of your Azure Storage account for the storageAccountName and storageAccountKey parameters in this parameter file. 已在此 Azure Blob 存储中创建 adftutorial 容器并将示例文件 (emp.txt) 上传到 input 文件夹。You created the adftutorial container and uploaded the sample file (emp.txt) to the input folder in this Azure blob storage.
  • 对于 dataFactoryName 参数,请为数据工厂指定一个全局唯一名称。Specify a globally unique name for the data factory for the dataFactoryName parameter. 例如:ARMTutorialFactoryJohnDoe11282017。For example: ARMTutorialFactoryJohnDoe11282017.
  • 对于 triggerStartTime,请按 2019-09-08T00:00:00 格式指定当天的时间。For the triggerStartTime, specify the current day in the format: 2019-09-08T00:00:00.
  • 对于 triggerEndTime,请按 2019-09-09T00:00:00 格式指定第二天的时间。For the triggerEndTime, specify the next day in the format: 2019-09-09T00:00:00. 也可查看当前的 UTC 时间,指定随后的一小时或两小时作为结束时间。You can also check the current UTC time and specify the next hour or two as the end time. 例如,如果现在的 UTC 时间为凌晨 1:32,则可指定 2019-09-09:03:00:00 作为结束时间。For example, if the UTC time now is 1:32 AM, specify 2019-09-09:03:00:00 as the end time. 在这种情况下,触发器运行管道两次(一次在凌晨 2 点,一次在凌晨 3 点)。In this case, the trigger runs the pipeline twice (at 2 AM and 3 AM).
{  
    "$schema":"https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#",
    "contentVersion":"1.0.0.0",
    "parameters":{  
        "dataFactoryName":{  
            "value":"<datafactoryname>"
        },
        "dataFactoryLocation":{  
            "value":"China East 2"
        },
        "storageAccountName":{  
            "value":"<yourstorageaccountname>"
        },
        "storageAccountKey":{  
            "value":"<yourstorageaccountkey>"
        },
        "triggerStartTime":{  
            "value":"2019-09-08T11:00:00"
        },
        "triggerEndTime":{  
            "value":"2019-09-08T14:00:00"
        }
    }
}

Important

在同一个数据工厂 JSON 模板中,可为开发、测试和生产环境使用不同的参数 JSON 文件。You may have separate parameter JSON files for development, testing, and production environments that you can use with the same Data Factory JSON template. 使用 Power Shell 脚本可在这些环境中自动部署数据工厂实体。By using a Power Shell script, you can automate deploying Data Factory entities in these environments.

部署数据工厂实体Deploy Data Factory entities

在 PowerShell 中,运行以下命令以使用在本快速入门中前面创建的资源管理器模板在资源组中部署数据工厂实体(在本例中,以 ADFTutorialResourceGroup 为例)。In PowerShell, run the following command to deploy Data Factory entities in your resource group (in this case, take ADFTutorialResourceGroup as an example) using the Resource Manager template you created earlier in this quickstart.

New-AzResourceGroupDeployment -Name MyARMDeployment -ResourceGroupName ADFTutorialResourceGroup -TemplateFile C:\ADFTutorial\ADFTutorialARM.json -TemplateParameterFile C:\ADFTutorial\ADFTutorialARM-Parameters.json

将显示类似于以下示例的输出:You see output similar to the following sample:

DeploymentName          : MyARMDeployment
ResourceGroupName       : ADFTutorialResourceGroup
ProvisioningState       : Succeeded
Timestamp               : 9/8/2019 10:52:29 AM
Mode                    : Incremental
TemplateLink            : 
Parameters              : 
                          Name                   Type                       Value     
                          =====================  =========================  ==========
                          dataFactoryName        String                     <data factory name>
                          dataFactoryLocation    String                     China East 2   
                          storageAccountName     String                     <storage account name>
                          storageAccountKey      SecureString                         
                          triggerStartTime       String                     9/8/2019 11:00:00 AM
                          triggerEndTime         String                     9/8/2019 2:00:00 PM
                          
Outputs                 : 
DeploymentDebugLogLevel : 

启动触发器Start the trigger

此模板部署以下数据工厂实体:The template deploys the following Data Factory entities:

  • Azure 存储链接服务Azure Storage linked service
  • 二进制数据集(输入和输出)Binary datasets (input and output)
  • 包含复制活动的管道Pipeline with a copy activity
  • 用于触发管道的触发器Trigger to trigger the pipeline

部署的触发器处于已停止状态。The deployed trigger is in stopped state. 若要启动触发器,一种方式是使用 Start-AzDataFactoryV2Trigger PowerShell cmdlet。One of the ways to start the trigger is to use the Start-AzDataFactoryV2Trigger PowerShell cmdlet. 以下过程提供了详细步骤:The following procedure provides detailed steps:

  1. 在 PowerShell 窗口中创建一个变量,用于保存资源组的名称。In the PowerShell window, create a variable to hold the name of the resource group. 将以下命令复制到 PowerShell 窗口中,然后按 ENTER。Copy the following command into the PowerShell window, and press ENTER. 如果已为 New-AzResourceGroupDeployment 命令指定了其他资源组名称,请在此处更新该值。If you have specified a different resource group name for the New-AzResourceGroupDeployment command, update the value here.

    $resourceGroupName = "ADFTutorialResourceGroup"
    
  2. 创建一个变量,用于保存数据工厂的名称。Create a variable to hold the name of the data factory. 指定一个已在 ADFTutorialARM-Parameters.json 文件中指定的名称。Specify the same name that you specified in the ADFTutorialARM-Parameters.json file.

    $dataFactoryName = "<yourdatafactoryname>"
    
  3. 设置一个变量作为触发器的名称。Set a variable for the name of the trigger. 触发器的名称在资源管理器模板文件 (ADFTutorialARM.json) 中进行硬编码。The name of the trigger is hardcoded in the Resource Manager template file (ADFTutorialARM.json).

    $triggerName = "ArmTemplateTestTrigger"
    
  4. 获取触发器的状态,方法是在指定数据工厂和触发器的名称后,运行以下 PowerShell 命令:Get the status of the trigger by running the following PowerShell command after specifying the name of your data factory and trigger:

    Get-AzDataFactoryV2Trigger -ResourceGroupName $resourceGroupName -DataFactoryName $dataFactoryName -Name $triggerName
    

    下面是示例输出:Here is the sample output:

    
    TriggerName       : ArmTemplateTestTrigger
    ResourceGroupName : ADFTutorialResourceGroup
    DataFactoryName   : ADFQuickstartsDataFactory0905
    Properties        : Microsoft.Azure.Management.DataFactory.Models.ScheduleTrigger
    RuntimeState      : Stopped
    

    请注意,触发器的运行时状态为“已停止”。 Notice that the runtime state of the trigger is Stopped.

  5. 启动触发器Start the trigger. 触发器在相应时间运行在模板中定义的管道。The trigger runs the pipeline defined in the template at the hour. 也就是说,如果在下午 2:25 执行此命令,则触发器会在下午 3 点首次运行管道。That's, if you executed this command at 2:25 PM, the trigger runs the pipeline at 3 PM for the first time. 然后,触发器会每小时运行一次管道,直至为触发器指定的结束时间。Then, it runs the pipeline hourly until the end time you specified for the trigger.

    Start-AzDataFactoryV2Trigger -ResourceGroupName $resourceGroupName -DataFactoryName $dataFactoryName -TriggerName $triggerName
    

    下面是示例输出:Here is the sample output:

    Confirm
    Are you sure you want to start trigger 'ArmTemplateTestTrigger' in data factory 'ARMFactory1128'?
    [Y] Yes  [N] No  [S] Suspend  [?] Help (default is "Y"): y
    True
    
  6. 再次运行 Get-AzDataFactoryV2Trigger 命令,确认触发器已启动。Confirm that the trigger has been started by running the Get-AzDataFactoryV2Trigger command again.

    Get-AzDataFactoryV2Trigger -ResourceGroupName $resourceGroupName -DataFactoryName $dataFactoryName -TriggerName $triggerName
    

    下面是示例输出:Here is the sample output:

    TriggerName       : ArmTemplateTestTrigger
    ResourceGroupName : ADFTutorialResourceGroup
    DataFactoryName   : ADFQuickstartsDataFactory0905
    Properties        : Microsoft.Azure.Management.DataFactory.Models.ScheduleTrigger
    RuntimeState      : Started
    

监视管道Monitor the pipeline

  1. 登录到 Azure 门户以后,单击“所有服务”,使用 data fa 之类的关键字进行搜索,然后选择“数据工厂”。 After logging in to the Azure portal, Click All services, search with the keyword such as data fa, and select Data factories.

  2. 在“数据工厂” 页中单击已创建的数据工厂。In the Data Factories page, click the data factory you created. 根据需要使用数据工厂的名称对列表进行筛选。If needed, filter the list with the name of your data factory.

  3. 在“数据工厂”页中,单击“创作和监视”磁贴。 In the Data factory page, click Author & Monitor tile.

  4. 在“开始使用”页中,选择“监视”选项卡。 监视管道运行In the Let's get started page, select the Monitor tab. Monitor pipeline run

    Important

    仅在整点(例如,早晨 4 点、早晨 5 点、早晨 6 点,等等)查看管道运行。You see pipeline runs only at the hour clock (for example: 4 AM, 5 AM, 6 AM, etc.). 当时间到达下一小时时,单击工具栏上的“刷新”以刷新列表。 Click Refresh on the toolbar to refresh the list when the time reaches the next hour.

  5. 在“操作”列中单击“查看活动运行”链接。 Click the View Activity Runs link in the Actions column.

    管道操作链接

  6. 此时会看到与管道运行相关联的活动运行。You see the activity runs associated with the pipeline run. 在本快速入门中,管道只有一个活动,其类型为:“复制”。In this quickstart, the pipeline has only one activity of type: Copy. 因此会看到该活动的一个运行。Therefore, you see a run for that activity.

    活动运行

  7. 单击“操作”列下的“输出” 链接。Click the Output link under Actions column. 此时会在“输出”窗口中看到复制操作的输出。 You see the output from the copy operation in an Output window. 单击最大化按钮可查看完整输出。Click the maximize button to see the full output. 可以关闭最大化输出窗口,也可以直接关闭它。You can close the maximized output window or close it.

  8. 看到运行成功/失败以后,即可停止触发器。Stop the trigger once you see a successful/failure run. 触发器一小时运行管道一次。The trigger runs the pipeline once an hour. 每次运行时,管道会将同一文件从 input 文件夹复制到 output 文件夹。The pipeline copies the same file from the input folder to the output folder for each run. 若要停止触发器,请在 PowerShell 窗口中运行以下命令。To stop the trigger, run the following command in the PowerShell window.

    Stop-AzDataFactoryV2Trigger -ResourceGroupName $resourceGroupName -DataFactoryName $dataFactoryName -Name $triggerName
    

验证输出Verify the output

该管道自动在 adftutorial Blob 容器中创建 output 文件夹,The pipeline automatically creates the output folder in the adftutorial blob container. 然后将 emp.txt 文件从 input 文件夹复制到 output 文件夹。Then, it copies the emp.txt file from the input folder to the output folder.

  1. 在 Azure 门户的“adftutorial”容器页中单击“刷新”,查看输出文件夹。 In the Azure portal, on the adftutorial container page, click Refresh to see the output folder.

    刷新

  2. 单击文件夹列表中的“output”。 Click output in the folder list.

  3. 确认 emp.txt 已复制到 output 文件夹。Confirm that the emp.txt is copied to the output folder.

    刷新

清理资源Clean up resources

可以通过两种方式清理在快速入门中创建的资源。You can clean up the resources that you created in the Quickstart in two ways. 可以删除 Azure 资源组,其中包括资源组中的所有资源。You can delete the Azure resource group, which includes all the resources in the resource group. 若要使其他资源保持原封不动,请仅删除在此教程中创建的数据工厂。If you want to keep the other resources intact, delete only the data factory you created in this tutorial.

删除资源组时会删除所有资源,包括其中的数据工厂。Deleting a resource group deletes all resources including data factories in it. 运行以下命令可以删除整个资源组:Run the following command to delete the entire resource group:

Remove-AzResourceGroup -ResourceGroupName $resourcegroupname

请注意:删除资源组可能需要一些时间。Note: dropping a resource group may take some time. 请耐心等待此过程完成Please be patient with the process

如果只需删除数据工厂,不需删除整个资源组,请运行以下命令:If you want to delete just the data factory, not the entire resource group, run the following command:

Remove-AzDataFactoryV2 -Name $dataFactoryName -ResourceGroupName $resourceGroupName

实体的 JSON 定义JSON definitions for entities

JSON 模板中定义了以下数据工厂实体:The following Data Factory entities are defined in the JSON template:

Azure 存储链接服务Azure Storage linked service

AzureStorageLinkedService 链接将 Azure 存储帐户链接到数据工厂。The AzureStorageLinkedService links your Azure storage account to the data factory. 已根据先决条件创建了一个容器并将数据上传到该存储帐户。You created a container and uploaded data to this storage account as part of prerequisites. 在本部分中指定 Azure 存储帐户的名称和密钥。You specify the name and key of your Azure storage account in this section. 有关用于定义 Azure 存储链接服务的 JSON 属性的详细信息。请参阅 Azure Storage linked service(Azure 存储链接服务)。See Azure Storage linked service for details about JSON properties used to define an Azure Storage linked service.

{  
    "name":"[concat(parameters('dataFactoryName'), '/ArmtemplateStorageLinkedService')]",
    "type":"Microsoft.DataFactory/factories/linkedServices",
    "apiVersion":"2018-06-01",
    "properties":{  
        "annotations":[  

        ],
        "type":"AzureBlobStorage",
        "typeProperties":{  
            "connectionString": "[concat('DefaultEndpointsProtocol=https;AccountName=',parameters('storageAccountName'),';AccountKey=',parameters('storageAccountKey'),';EndpointSuffix=core.chinacloudapi.cn')]"
        }
    },
    "dependsOn":[  
        "[parameters('dataFactoryName')]"
    ]
}

connectionString 使用 storageAccountName 和 storageAccountKey 参数。The connectionString uses the storageAccountName and storageAccountKey parameters. 可以使用配置文件传递这些参数的值。The values for these parameters passed by using a configuration file. 该定义还使用了模板中定义的变量 azureStorageLinkedService 和 dataFactoryName。The definition also uses variables: azureStorageLinkedService and dataFactoryName defined in the template.

二进制输入数据集Binary input dataset

Azure 存储链接服务指定一个连接字符串,数据工厂服务在运行时使用该字符串连接到 Azure 存储帐户。The Azure storage linked service specifies the connection string that Data Factory service uses at run time to connect to your Azure storage account. 在二进制数据集定义中,请指定包含输入数据的 Blob 容器、文件夹和文件的名称。In Binary dataset definition, you specify names of blob container, folder, and file that contains the input data. 有关用于定义二进制数据集的 JSON 属性的详细信息,请参阅二进制数据集属性See Binary dataset properties for details about JSON properties used to define a Binary dataset.

{  
    "name":"[concat(parameters('dataFactoryName'), '/ArmtemplateTestDatasetIn')]",
    "type":"Microsoft.DataFactory/factories/datasets",
    "apiVersion":"2018-06-01",
    "properties":{  
        "linkedServiceName":{  
            "referenceName":"ArmtemplateStorageLinkedService",
            "type":"LinkedServiceReference"
        },
        "annotations":[  

        ],
        "type":"Binary",
        "typeProperties":{  
            "location":{  
                "type":"AzureBlobStorageLocation",
                "fileName":"emp.txt",
                "folderPath":"input",
                "container":"adftutorial"
            }
        }
    },
    "dependsOn":[  
        "[parameters('dataFactoryName')]",
        "[concat(variables('factoryId'), '/linkedServices/ArmtemplateStorageLinkedService')]"
    ]
}

二进制输出数据集Binary output dataset

请在 Azure Blob 存储中指定一个文件夹的名称,用于保存从 input 文件夹复制的数据。You specify the name of the folder in the Azure Blob Storage that holds the copied data from the input folder. 有关用于定义二进制数据集的 JSON 属性的详细信息,请参阅二进制数据集属性See Binary dataset properties for details about JSON properties used to define a Binary dataset.

{  
    "name":"[concat(parameters('dataFactoryName'), '/ArmtemplateTestDatasetOut')]",
    "type":"Microsoft.DataFactory/factories/datasets",
    "apiVersion":"2018-06-01",
    "properties":{  
        "linkedServiceName":{  
            "referenceName":"ArmtemplateStorageLinkedService",
            "type":"LinkedServiceReference"
        },
        "annotations":[  

        ],
        "type":"Binary",
        "typeProperties":{  
            "location":{  
                "type":"AzureBlobStorageLocation",
                "folderPath":"output",
                "container":"adftutorial"
            }
        }
    },
    "dependsOn":[  
        "[parameters('dataFactoryName')]",
        "[concat(variables('factoryId'), '/linkedServices/ArmtemplateStorageLinkedService')]"
    ]
}

数据管道Data pipeline

定义将数据从一个二进制数据集复制到另一个二进制数据集的管道。You define a pipeline that copies data from one Binary dataset to another Binary dataset. 有关用于定义本示例中所述管道的 JSON 元素的说明,请参阅 Pipeline JSON (管道 JSON)。See Pipeline JSON for descriptions of JSON elements used to define a pipeline in this example.

{  
    "name":"[concat(parameters('dataFactoryName'), '/ArmtemplateSampleCopyPipeline')]",
    "type":"Microsoft.DataFactory/factories/pipelines",
    "apiVersion":"2018-06-01",
    "properties":{  
        "activities":[  
            {  
                "name":"MyCopyActivity",
                "type":"Copy",
                "dependsOn":[  

                ],
                "policy":{  
                    "timeout":"7.00:00:00",
                    "retry":0,
                    "retryIntervalInSeconds":30,
                    "secureOutput":false,
                    "secureInput":false
                },
                "userProperties":[  

                ],
                "typeProperties":{  
                    "source":{  
                        "type":"BinarySource",
                        "storeSettings":{  
                            "type":"AzureBlobStorageReadSettings",
                            "recursive":true
                        }
                    },
                    "sink":{  
                        "type":"BinarySink",
                        "storeSettings":{  
                            "type":"AzureBlobStorageWriteSettings"
                        }
                    },
                    "enableStaging":false
                },
                "inputs":[  
                    {  
                        "referenceName":"ArmtemplateTestDatasetIn",
                        "type":"DatasetReference",
                        "parameters":{  

                        }
                    }
                ],
                "outputs":[  
                    {  
                        "referenceName":"ArmtemplateTestDatasetOut",
                        "type":"DatasetReference",
                        "parameters":{  

                        }
                    }
                ]
            }
        ],
        "annotations":[  

        ]
    },
    "dependsOn":[  
        "[parameters('dataFactoryName')]",
        "[concat(variables('factoryId'), '/datasets/ArmtemplateTestDatasetIn')]",
        "[concat(variables('factoryId'), '/datasets/ArmtemplateTestDatasetOut')]"
    ]
}

触发器Trigger

定义一个每小时运行一次管道的触发器。You define a trigger that runs the pipeline once an hour. 部署的触发器处于已停止状态。The deployed trigger is in stopped state. 使用 Start-AzDataFactoryV2Trigger cmdlet 启动触发器。Start the trigger by using the Start-AzDataFactoryV2Trigger cmdlet. 有关触发器的详细信息,请参阅管道执行和触发器一文。For more information about triggers, see Pipeline execution and triggers article.

{  
    "name":"[concat(parameters('dataFactoryName'), '/ArmTemplateTestTrigger')]",
    "type":"Microsoft.DataFactory/factories/triggers",
    "apiVersion":"2018-06-01",
    "properties":{  
        "annotations":[  

        ],
        "runtimeState":"Started",
        "pipelines":[  
            {  
                "pipelineReference":{  
                    "referenceName":"ArmtemplateSampleCopyPipeline",
                    "type":"PipelineReference"
                },
                "parameters":{  

                }
            }
        ],
        "type":"ScheduleTrigger",
        "typeProperties":{  
            "recurrence":{  
                "frequency":"Hour",
                "interval":1,
                "startTime":"[parameters('triggerStartTime')]",
                "endTime":"[parameters('triggerEndTime')]",
                "timeZone":"UTC"
            }
        }
    },
    "dependsOn":[  
        "[parameters('dataFactoryName')]",
        "[concat(variables('factoryId'), '/pipelines/ArmtemplateSampleCopyPipeline')]"
    ]
}

重复使用模板Reuse the template

本教程创建了一个用于定义数据工厂实体的模板,以及一个用于传递参数值的模板。In the tutorial, you created a template for defining Data Factory entities and a template for passing values for parameters. 要使用同一个模板将数据工厂实体部署到不同的环境,可为每个环境创建一个参数文件,并在部署到该环境时使用该文件。To use the same template to deploy Data Factory entities to different environments, you create a parameter file for each environment and use it when deploying to that environment.

示例:Example:

New-AzResourceGroupDeployment -Name MyARMDeployment -ResourceGroupName ADFTutorialResourceGroup -TemplateFile ADFTutorialARM.json -TemplateParameterFile ADFTutorialARM-Parameters-Dev.json

New-AzResourceGroupDeployment -Name MyARMDeployment -ResourceGroupName ADFTutorialResourceGroup -TemplateFile ADFTutorialARM.json -TemplateParameterFile ADFTutorialARM-Parameters-Test.json

New-AzResourceGroupDeployment -Name MyARMDeployment -ResourceGroupName ADFTutorialResourceGroup -TemplateFile ADFTutorialARM.json -TemplateParameterFile ADFTutorialARM-Parameters-Production.json

请注意,三条命令分别使用开发环境、测试环境和生产环境的参数文件。Notice that the first command uses parameter file for the development environment, second one for the test environment, and the third one for the production environment.

可以重复使用该模板来执行重复的任务。You can also reuse the template to perform repeated tasks. 例如,创建多个数据工厂,其中包含用于实现相同逻辑的一个或多个管道,但每个数据工厂使用不同的 Azure 存储帐户。For example, create many data factories with one or more pipelines that implement the same logic but each data factory uses different Azure storage accounts. 在这种情况下,可以在同一个环境(开发、测试或生产)中使用包含不同参数文件的同一个模板来创建数据工厂。In this scenario, you use the same template in the same environment (dev, test, or production) with different parameter files to create data factories.

后续步骤Next steps

此示例中的管道将数据从 Azure Blob 存储中的一个位置复制到另一个位置。The pipeline in this sample copies data from one location to another location in an Azure blob storage. 完成相关教程来了解如何在更多方案中使用数据工厂。Go through the tutorials to learn about using Data Factory in more scenarios.