使用 Azure 数据工厂向/从 Azure 数据资源管理器复制数据Copy data to or from Azure Data Explorer by using Azure Data Factory

适用于: Azure 数据工厂 Azure Synapse Analytics(预览版)

本文介绍如何在 Azure 数据工厂中使用复制活动向/从 Azure 数据资源管理器复制数据。This article describes how to use the copy activity in Azure Data Factory to copy data to or from Azure Data Explorer. 本文是在总体概述复制活动的复制活动概述一文的基础之上编写的。It builds on the copy activity overview article, which offers a general overview of copy activity.

提示

通常,对于 Azure 数据工厂和 Azure 数据资源管理器的集成,请从将 Azure 数据资源管理器与 Azure 数据工厂集成了解更多信息。For Azure Data Factory and Azure Data Explorer integration in general, learn more from Integrate Azure Data Explorer with Azure Data Factory.

支持的功能Supported capabilities

以下活动支持此 Azure 数据资源管理器连接器:This Azure Data Explorer connector is supported for the following activities:

可以将数据从任何受支持的源数据存储复制到 Azure 数据资源管理器。You can copy data from any supported source data store to Azure Data Explorer. 可以将数据从 Azure 数据资源管理器复制到任何受支持的接收器数据存储。You can also copy data from Azure Data Explorer to any supported sink data store. 有关复制活动支持作为源或接收器的数据存储的列表,请参阅支持的数据存储表。For a list of data stores that the copy activity supports as sources or sinks, see the Supported data stores table.

备注

版本 3.14 和更高版本支持使用自承载集成运行时通过本地数据存储向/从 Azure 数据资源管理器复制数据。Copying data to or from Azure Data Explorer through an on-premises data store by using self-hosted integration runtime is supported in version 3.14 and later.

使用 Azure 数据资源管理器连接器可执行以下操作:With the Azure Data Explorer connector, you can do the following:

  • 将 Azure Active Directory (Azure AD) 应用程序令牌身份验证与服务主体配合使用来复制数据。Copy data by using Azure Active Directory (Azure AD) application token authentication with a service principal.
  • 作为源时,请使用 KQL (Kusto) 查询来检索数据。As a source, retrieve data by using a KQL (Kusto) query.
  • 作为接收器时,请将数据追加到目标表。As a sink, append data to a destination table.

入门Getting started

若要使用管道执行复制活动,可以使用以下工具或 SDK 之一:To perform the Copy activity with a pipeline, you can use one of the following tools or SDKs:

对于特定于 Azure 数据资源管理器连接器的数据工厂实体,以下部分提供了有关用于定义这些实体的属性的详细信息。The following sections provide details about properties that are used to define Data Factory entities specific to Azure Data Explorer connector.

链接服务属性Linked service properties

Azure 数据资源管理器连接器使用服务主体身份验证。The Azure Data Explorer connector uses service principal authentication. 遵循以下步骤获取服务主体并授予权限:Follow these steps to get a service principal and to grant permissions:

  1. 遵循将应用程序注册到 Azure AD 租户中的步骤在 Azure Active Directory 中注册一个应用程序实体。Register an application entity in Azure Active Directory by following the steps in Register your application with an Azure AD tenant. 记下下面的值,这些值用于定义链接服务:Make note of the following values, which you use to define the linked service:

    • 应用程序 IDApplication ID
    • 应用程序密钥Application key
    • 租户 IDTenant ID
  2. 在 Azure 数据资源管理器中为服务主体授予正确的权限。Grant the service principal the correct permissions in Azure Data Explorer. 有关角色和权限以及管理权限的详细信息,请参阅管理 Azure 数据资源管理器数据库权限See Manage Azure Data Explorer database permissions for detailed information about roles and permissions and about managing permissions. 一般情况下,必须授予以下权限:In general, you must:

    • 作为源:至少向数据库授予“数据库查看者”角色 As source, grant at least the Database viewer role to your database
    • 作为接收器:至少向数据库授予“数据库引入者”角色 As sink, grant at least the Database ingestor role to your database

备注

使用数据工厂 UI 创作时,登录用户帐户用于列出 Azure 数据资源管理器群集、数据库和表。When you use the Data Factory UI to author, your login user account is used to list Azure Data Explorer clusters, databases, and tables. 如果你没有权限执行这些操作,请手动输入名称。Manually enter the name if you don't have permission for these operations.

Azure 数据资源管理器链接服务支持以下属性:The following properties are supported for the Azure Data Explorer linked service:

属性Property 说明Description 必须Required
typetype type 属性必须设置为 AzureDataExplorerThe type property must be set to AzureDataExplorer. Yes
endpointendpoint Azure 数据资源管理器群集的终结点 URL,格式为 https://<clusterName>.<regionName>.kusto.chinacloudapi.cnEndpoint URL of the Azure Data Explorer cluster, with the format as https://<clusterName>.<regionName>.kusto.chinacloudapi.cn. Yes
databasedatabase 数据库的名称。Name of database. Yes
tenanttenant 指定应用程序的租户信息(域名或租户 ID)。Specify the tenant information (domain name or tenant ID) under which your application resides. 此 ID 在 Kusto 连接字符串中称为“颁发机构 ID”。This is known as "Authority ID" in Kusto connection string. 将鼠标指针悬停在 Azure 门户右上角进行检索。Retrieve it by hovering the mouse pointer in the upper-right corner of the Azure portal. Yes
servicePrincipalIdservicePrincipalId 指定应用程序的客户端 ID。Specify the application's client ID. 此 ID 在Kusto 连接字符串中称为“AAD 应用程序客户端 ID”。This is known as "AAD application client ID" in Kusto connection string. Yes
servicePrincipalKeyservicePrincipalKey 指定应用程序的密钥。Specify the application's key. 此密钥在Kusto 连接字符串中称为“AAD 应用程序密钥”。This is known as "AAD application key" in Kusto connection string. 将此字段标记为 SecureString 以安全地将其存储在数据工厂中,或引用存储在 Azure Key Vault 中的安全数据Mark this field as a SecureString to store it securely in Data Factory, or reference secure data stored in Azure Key Vault. Yes

链接服务属性示例:Linked service properties example:

{
    "name": "AzureDataExplorerLinkedService",
    "properties": {
        "type": "AzureDataExplorer",
        "typeProperties": {
            "endpoint": "https://<clusterName>.<regionName>.kusto.chinacloudapi.cn",
            "database": "<database name>",
            "tenant": "<tenant name/id e.g. microsoft.partner.onmschina.cn>",
            "servicePrincipalId": "<service principal id>",
            "servicePrincipalKey": {
                "type": "SecureString",
                "value": "<service principal key>"
            }
        }
    }
}

数据集属性Dataset properties

有关可用于定义数据集的各个部分和属性的完整列表,请参阅 Azure 数据工厂中的数据集For a full list of sections and properties available for defining datasets, see Datasets in Azure Data Factory. 本部分列出了 Azure 数据资源管理器数据集支持的属性。This section lists properties that the Azure Data Explorer dataset supports.

若要将数据复制到 Azure 数据资源管理器,请将数据集的 type 属性设置为 AzureDataExplorerTableTo copy data to Azure Data Explorer, set the type property of the dataset to AzureDataExplorerTable.

支持以下属性:The following properties are supported:

属性Property 说明Description 必须Required
typetype type 属性必须设置为 AzureDataExplorerTableThe type property must be set to AzureDataExplorerTable. Yes
table 链接服务引用的表的名称。The name of the table that the linked service refers to. 对于接收器为必需的,对于源不是必需的Yes for sink; No for source

数据集属性示例:Dataset properties example:

{
   "name": "AzureDataExplorerDataset",
    "properties": {
        "type": "AzureDataExplorerTable",
        "typeProperties": {
            "table": "<table name>"
        },
        "schema": [],
        "linkedServiceName": {
            "referenceName": "<Azure Data Explorer linked service name>",
            "type": "LinkedServiceReference"
        }
    }
}

复制活动属性Copy activity properties

有关可用于定义活动的各部分和属性的完整列表,请参阅 Azure 数据工厂中的管道和活动For a full list of sections and properties available for defining activities, see Pipelines and activities in Azure Data Factory. 本部分提供了 Azure 数据资源管理器源和接收器支持的属性列表。This section provides a list of properties that Azure Data Explorer sources and sinks support.

Azure 数据资源管理器作为源Azure Data Explorer as source

若要从 Azure 数据资源管理器复制数据,请将复制活动源中的 type 属性设置为 AzureDataExplorerSourceTo copy data from Azure Data Explorer, set the type property in the Copy activity source to AzureDataExplorerSource. 复制活动source部分支持以下属性:The following properties are supported in the copy activity source section:

属性Property 说明Description 必须Required
typetype 复制活动源的 type 属性必须设置为:AzureDataExplorerSourceThe type property of the copy activity source must be set to: AzureDataExplorerSource Yes
查询query KQL 格式指定的只读请求。A read-only request given in a KQL format. 使用自定义 KQL 查询作为参考。Use the custom KQL query as a reference. Yes
queryTimeoutqueryTimeout 查询请求超时前的等待时间。默认值是 10 分钟 (00:10:00);允许的最大值是 1 小时 (01:00:00)。The wait time before the query request times out. Default value is 10 min (00:10:00); allowed max value is 1 hour (01:00:00). No
noTruncationnoTruncation 指示是否截断返回的结果集。Indicates whether to truncate the returned result set. 默认情况下,结果在出现 500,000 条记录或达到 64 MB 之后将被截断。By default, result is truncated after 500,000 records or 64 megabytes (MB). 强烈建议将其截断,以确保活动的正确行为。Truncation is strongly recommended to ensure the correct behavior of the activity. No

备注

默认情况下,Azure 数据资源管理器源的大小限制为 500,000 条记录或 64 MB。By default, Azure Data Explorer source has a size limit of 500,000 records or 64 MB. 若要检索所有记录而不截断,可以在查询的开头指定 set notruncation;To retrieve all the records without truncation, you can specify set notruncation; at the beginning of your query. 有关详细信息,请参阅查询限制For more information, see Query limits.

示例:Example:

"activities":[
    {
        "name": "CopyFromAzureDataExplorer",
        "type": "Copy",
        "typeProperties": {
            "source": {
                "type": "AzureDataExplorerSource",
                "query": "TestTable1 | take 10",
                "queryTimeout": "00:10:00"
            },
            "sink": {
                "type": "<sink type>"
            }
        },
        "inputs": [
            {
                "referenceName": "<Azure Data Explorer input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<output dataset name>",
                "type": "DatasetReference"
            }
        ]
    }
]

Azure 数据资源管理器作为接收器Azure Data Explorer as sink

若要将数据复制到 Azure 数据资源管理器,请将复制活动接收器中的 type 属性设置为 AzureDataExplorerSinkTo copy data to Azure Data Explorer, set the type property in the copy activity sink to AzureDataExplorerSink. 复制活动接收器部分中支持以下属性 :The following properties are supported in the copy activity sink section:

属性Property 说明Description 必须Required
typetype 复制活动接收器的 type 属性必须设置为:AzureDataExplorerSinkThe type property of the copy activity sink must be set to: AzureDataExplorerSink. Yes
ingestionMappingNameingestionMappingName 基于 Kusto 表预先创建的映射的名称。Name of a pre-created mapping on a Kusto table. 若要将源中的列映射到 Azure 数据资源管理器(适用于所有支持的源存储和格式,包括 CSV/JSON/Avro 格式),可以使用复制活动列映射(按名称隐式映射或按配置显式映射)和/或 Azure 数据资源管理器映射。To map the columns from source to Azure Data Explorer (which applies to all supported source stores and formats, including CSV/JSON/Avro formats), you can use the copy activity column mapping (implicitly by name or explicitly as configured) and/or Azure Data Explorer mappings. No
additionalPropertiesadditionalProperties 一个属性包,可用于指定 Azure 数据资源管理器接收器尚未设置的任何引入属性。A property bag which can be used for specifying any of the ingestion properties which aren't being set already by the Azure Data Explorer Sink. 具体来说,它可用于指定引入标记。Specifically, it can be useful for specifying ingestion tags. Azure 数据资源管理器数据引入文档了解更多信息。Learn more from Azure Data Explore data ingestion doc. No

示例:Example:

"activities":[
    {
        "name": "CopyToAzureDataExplorer",
        "type": "Copy",
        "typeProperties": {
            "source": {
                "type": "<source type>"
            },
            "sink": {
                "type": "AzureDataExplorerSink",
                "ingestionMappingName": "<optional Azure Data Explorer mapping name>",
                "additionalProperties": {<additional settings for data ingestion>}
            }
        },
        "inputs": [
            {
                "referenceName": "<input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<Azure Data Explorer output dataset name>",
                "type": "DatasetReference"
            }
        ]
    }
]

查找活动属性Lookup activity properties

有关属性的详细信息,请参阅查找活动For more information about the properties, see Lookup activity.

后续步骤Next steps