Azure 数据工厂中的链接服务Linked services in Azure Data Factory

适用于: Azure 数据工厂 Azure Synapse Analytics(预览版)

本文介绍了链接服务的定义,如何以 JSON 格式定义链接服务,以及链接服务在 Azure 数据工厂管道中的用法。This article describes what linked services are, how they're defined in JSON format, and how they're used in Azure Data Factory pipelines.

如果对数据工厂不熟悉,请参阅 Azure 数据工厂简介了解相关概述。If you're new to Data Factory, see Introduction to Azure Data Factory for an overview.

概述Overview

数据工厂可以包含一个或多个数据管道。A data factory can have one or more pipelines. “管道”是共同执行一项任务的活动的逻辑分组。A pipeline is a logical grouping of activities that together perform a task. 管道中的活动定义对数据执行的操作。The activities in a pipeline define actions to perform on your data. 例如,可使用复制活动将数据从 SQL Server 复制到 Azure Blob 存储。For example, you might use a copy activity to copy data from SQL Server to Azure Blob storage. 然后,可使用在 Azure HDInsight 群集上运行 Hive 脚本的 Hive 活动,将 Blob 存储中的数据处理为生成输出数据。Then, you might use a Hive activity that runs a Hive script on an Azure HDInsight cluster to process data from Blob storage to produce output data. 最后,可再使用一个复制活动将输出数据复制到 Azure Synapse Analytics(以前称为 Azure SQL 数据仓库),将基于其构建商业智能 (BI) 报告解决方案。Finally, you might use a second copy activity to copy the output data to Azure Synapse Analytics (formerly SQL Data Warehouse), on top of which business intelligence (BI) reporting solutions are built. 有关管道和活动的详细信息,请参阅 Azure 数据工厂中的管道和活动For more information about pipelines and activities, see Pipelines and activities in Azure Data Factory.

现在,数据集这一名称的意义已经变为看待数据的一种方式,就是以输入和输出的形式指向或引用活动中要使用的数据 。Now, a dataset is a named view of data that simply points or references the data you want to use in your activities as inputs and outputs.

创建数据集之前,必须创建“链接的服务”,将数据存储链接到数据工厂。Before you create a dataset, you must create a linked service to link your data store to the data factory. 链接的服务类似于连接字符串,它定义数据工厂连接到外部资源时所需的连接信息。Linked services are much like connection strings, which define the connection information needed for Data Factory to connect to external resources. 不妨这样考虑:数据集代表链接的数据存储中的数据结构,而链接服务则定义到数据源的连接。Think of it this way; the dataset represents the structure of the data within the linked data stores, and the linked service defines the connection to the data source. 例如,Azure 存储链接服务可将存储帐户链接到数据工厂。For example, an Azure Storage linked service links a storage account to the data factory. Azure Blob 数据集表示该 Azure 存储帐户中包含要处理的输入 Blob 的 Blob 容器和文件夹。An Azure Blob dataset represents the blob container and the folder within that Azure Storage account that contains the input blobs to be processed.

下面是一个示例方案。Here is a sample scenario. 要将数据从 Blob 存储复制到 SQL 数据库,请创建以下两个链接服务:Azure 存储和 Azure SQL 数据库。To copy data from Blob storage to a SQL Database, you create two linked services: Azure Storage and Azure SQL Database. 然后创建两个数据集:Azure Blob 数据集(即 Azure 存储链接服务)和 Azure SQL 表数据集(即 Azure SQL 数据库链接服务)。Then, create two datasets: Azure Blob dataset (which refers to the Azure Storage linked service) and Azure SQL Table dataset (which refers to the Azure SQL Database linked service). Azure 存储和 Azure SQL 数据库链接服务分别包含数据工厂在运行时用于连接到 Azure 存储和 Azure SQL 数据库的连接字符串。The Azure Storage and Azure SQL Database linked services contain connection strings that Data Factory uses at runtime to connect to your Azure Storage and Azure SQL Database, respectively. Azure Blob 数据集指定 blob 容器和 blob 文件夹,该文件夹包含 Blob 存储中的输入 blob。The Azure Blob dataset specifies the blob container and blob folder that contains the input blobs in your Blob storage. Azure SQL 表数据集指定你的 SQL 数据库中要将数据复制到其中的 SQL 表。The Azure SQL Table dataset specifies the SQL table in your SQL Database to which the data is to be copied.

下图显示了数据工厂中管道、活动、数据集和链接服务之间的关系:The following diagram shows the relationships among pipeline, activity, dataset, and linked service in Data Factory:

管道、活动、数据集和链接服务之间的关系

链接服务 JSONLinked service JSON

数据工厂中的链接服务采用 JSON 格式定义,如下所示:A linked service in Data Factory is defined in JSON format as follows:

{
    "name": "<Name of the linked service>",
    "properties": {
        "type": "<Type of the linked service>",
        "typeProperties": {
              "<data store or compute-specific type properties>"
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

下表描述了上述 JSON 中的属性:The following table describes properties in the above JSON:

属性Property 说明Description 必须Required
namename 链接服务的名称。Name of the linked service. 请参阅 Azure 数据工厂 - 命名规则See Azure Data Factory - Naming rules. Yes
typetype 链接服务的类型。Type of the linked service. 例如:AzureBlobStorage(数据存储)或 AzureBatch(计算)。For example: AzureBlobStorage (data store) or AzureBatch (compute). 请参阅 typeProperties 说明。See the description for typeProperties. Yes
typePropertiestypeProperties 每个数据存储或计算的类型属性各不相同。The type properties are different for each data store or compute.

有关支持的数据存储类型及其类型属性,请参阅连接器概述一文。For the supported data store types and their type properties, see the connector overview article. 导航到数据存储连接器一文,了解特定于数据存储的类型属性。Navigate to the data store connector article to learn about type properties specific to a data store.

有关支持的计算类型及其类型属性,请参阅计算链接服务For the supported compute types and their type properties, see Compute linked services.
Yes
connectViaconnectVia 用于连接到数据存储的集成运行时The Integration Runtime to be used to connect to the data store. 如果数据存储位于专用网络,则可以使用 Azure 集成运行时或自承载集成运行时。You can use Azure Integration Runtime or Self-hosted Integration Runtime (if your data store is located in a private network). 如果未指定,则使用默认 Azure Integration Runtime。If not specified, it uses the default Azure Integration Runtime. No

链接服务示例Linked service example

以下链接服务是 Azure Blob 存储链接服务。The following linked service is an Azure Blob storage linked service. 请注意,类型设置为“Azure Blob 存储”。Notice that the type is set to Azure Blob storage. Azure Blob 存储链接服务的类型属性包含连接字符串。The type properties for the Azure Blob storage linked service include a connection string. 数据工厂服务使用此连接字符串在运行时连接到数据存储。The Data Factory service uses this connection string to connect to the data store at runtime.

{
    "name": "AzureBlobStorageLinkedService",
    "properties": {
        "type": "AzureBlobStorage",
        "typeProperties": {
            "connectionString": "DefaultEndpointsProtocol=https;AccountName=<accountname>;AccountKey=<accountkey>;EndpointSuffix=core.chinacloudapi.cn"
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

创建链接服务Create linked services

可以在 Azure 数据工厂 UX 中通过管理中心创建链接服务以及可引用它们的任何活动或数据集。Linked services can be created in the Azure Data Factory UX via the management hub and any activities, or datasets that reference them.

可以使用以下任一工具或 SDK 创建链接服务:.NET APIPowerShellREST API、Azure 资源管理器模板和 Azure 门户。You can create linked services by using one of these tools or SDKs: .NET API, PowerShell, REST API, Azure Resource Manager Template, and Azure portal.

数据存储链接的服务Data store linked services

可以从连接器概述一文中找到数据工厂支持的数据存储列表。You can find the list of data stores supported by Data Factory from connector overview article. 单击数据存储可了解支持的连接属性。Click a data store to learn the supported connection properties.

计算链接服务Compute linked services

有关可以从数据工厂连接到的不同计算环境以及不同配置的详细信息,请参考支持的计算环境Reference compute environments supported for details about different compute environments you can connect to from your data factory as well as the different configurations.

后续步骤Next steps

请参阅以下教程,了解使用下列某个工具或 SDK 创建管道和数据集的分步说明。See the following tutorial for step-by-step instructions for creating pipelines and datasets by using one of these tools or SDKs.