使用 Azure 数据工厂向/从 Azure 数据资源管理器复制数据Copy data to or from Azure Data Explorer by using Azure Data Factory
Azure 数据工厂
Azure Synapse Analytics
本文介绍如何在 Azure 数据工厂中使用复制活动向/从 Azure 数据资源管理器复制数据。This article describes how to use the copy activity in Azure Data Factory to copy data to or from Azure Data Explorer. 本文是在总体概述复制活动的复制活动概述一文的基础之上编写的。It builds on the copy activity overview article, which offers a general overview of copy activity.
提示
通常,对于 Azure 数据工厂和 Azure 数据资源管理器的集成,请从将 Azure 数据资源管理器与 Azure 数据工厂集成了解更多信息。For Azure Data Factory and Azure Data Explorer integration in general, learn more from Integrate Azure Data Explorer with Azure Data Factory.
支持的功能Supported capabilities
以下活动支持此 Azure 数据资源管理器连接器:This Azure Data Explorer connector is supported for the following activities:
可以将数据从任何受支持的源数据存储复制到 Azure 数据资源管理器。You can copy data from any supported source data store to Azure Data Explorer. 可以将数据从 Azure 数据资源管理器复制到任何受支持的接收器数据存储。You can also copy data from Azure Data Explorer to any supported sink data store. 有关复制活动支持作为源或接收器的数据存储的列表,请参阅支持的数据存储表。For a list of data stores that the copy activity supports as sources or sinks, see the Supported data stores table.
备注
版本 3.14 和更高版本支持使用自承载集成运行时通过本地数据存储向/从 Azure 数据资源管理器复制数据。Copying data to or from Azure Data Explorer through an on-premises data store by using self-hosted integration runtime is supported in version 3.14 and later.
使用 Azure 数据资源管理器连接器可执行以下操作:With the Azure Data Explorer connector, you can do the following:
- 将 Azure Active Directory (Azure AD) 应用程序令牌身份验证与服务主体配合使用来复制数据。Copy data by using Azure Active Directory (Azure AD) application token authentication with a service principal.
- 作为源时,请使用 KQL (Kusto) 查询来检索数据。As a source, retrieve data by using a KQL (Kusto) query.
- 作为接收器时,请将数据追加到目标表。As a sink, append data to a destination table.
入门Getting started
提示
有关 Azure 数据资源管理器连接器的演练,请参阅使用 Azure 数据工厂向/从 Azure 数据资源管理器复制数据和从数据库大容量复制到 Azure 数据资源管理器。For a walkthrough of Azure Data Explorer connector, see Copy data to/from Azure Data Explorer using Azure Data Factory and Bulk copy from a database to Azure Data Explorer.
若要使用管道执行复制活动,可以使用以下工具或 SDK 之一:To perform the Copy activity with a pipeline, you can use one of the following tools or SDKs:
- 复制数据工具The Copy Data tool
- Azure 门户The Azure portal
- .NET SDKThe .NET SDK
- Python SDKThe Python SDK
- Azure PowerShellAzure PowerShell
- REST APIThe REST API
- Azure 资源管理器模板The Azure Resource Manager template
对于特定于 Azure 数据资源管理器连接器的数据工厂实体,以下部分提供了有关用于定义这些实体的属性的详细信息。The following sections provide details about properties that are used to define Data Factory entities specific to Azure Data Explorer connector.
链接服务属性Linked service properties
Azure 数据资源管理器连接器使用服务主体身份验证。The Azure Data Explorer connector uses service principal authentication. 遵循以下步骤获取服务主体并授予权限:Follow these steps to get a service principal and to grant permissions:
遵循将应用程序注册到 Azure AD 租户中的步骤在 Azure Active Directory 中注册一个应用程序实体。Register an application entity in Azure Active Directory by following the steps in Register your application with an Azure AD tenant. 记下下面的值,这些值用于定义链接服务:Make note of the following values, which you use to define the linked service:
- 应用程序 IDApplication ID
- 应用程序密钥Application key
- 租户 IDTenant ID
在 Azure 数据资源管理器中为服务主体授予正确的权限。Grant the service principal the correct permissions in Azure Data Explorer. 有关角色和权限以及管理权限的详细信息,请参阅管理 Azure 数据资源管理器数据库权限。See Manage Azure Data Explorer database permissions for detailed information about roles and permissions and about managing permissions. 一般情况下,必须授予以下权限:In general, you must:
- 作为源:至少向数据库授予“数据库查看者”角色 As source, grant at least the Database viewer role to your database
- 作为接收器:至少向数据库授予“数据库引入者”角色 As sink, grant at least the Database ingestor role to your database
备注
使用数据工厂 UI 创作时,登录用户帐户用于列出 Azure 数据资源管理器群集、数据库和表。When you use the Data Factory UI to author, your login user account is used to list Azure Data Explorer clusters, databases, and tables. 如果你没有权限执行这些操作,请手动输入名称。Manually enter the name if you don't have permission for these operations.
Azure 数据资源管理器链接服务支持以下属性:The following properties are supported for the Azure Data Explorer linked service:
属性Property | 说明Description | 必须Required |
---|---|---|
typetype | type 属性必须设置为 AzureDataExplorer。The type property must be set to AzureDataExplorer. | 是Yes |
endpointendpoint | Azure 数据资源管理器群集的终结点 URL,格式为 https://<clusterName>.<regionName>.kusto.chinacloudapi.cn 。Endpoint URL of the Azure Data Explorer cluster, with the format as https://<clusterName>.<regionName>.kusto.chinacloudapi.cn . |
是Yes |
databasedatabase | 数据库的名称。Name of database. | 是Yes |
tenanttenant | 指定应用程序的租户信息(域名或租户 ID)。Specify the tenant information (domain name or tenant ID) under which your application resides. 此 ID 在 Kusto 连接字符串中称为“颁发机构 ID”。This is known as "Authority ID" in Kusto connection string. 将鼠标指针悬停在 Azure 门户右上角进行检索。Retrieve it by hovering the mouse pointer in the upper-right corner of the Azure portal. | 是Yes |
servicePrincipalIdservicePrincipalId | 指定应用程序的客户端 ID。Specify the application's client ID. 此 ID 在Kusto 连接字符串中称为“AAD 应用程序客户端 ID”。This is known as "AAD application client ID" in Kusto connection string. | 是Yes |
servicePrincipalKeyservicePrincipalKey | 指定应用程序的密钥。Specify the application's key. 此密钥在Kusto 连接字符串中称为“AAD 应用程序密钥”。This is known as "AAD application key" in Kusto connection string. 将此字段标记为 SecureString 以安全地将其存储在数据工厂中,或引用存储在 Azure Key Vault 中的安全数据。Mark this field as a SecureString to store it securely in Data Factory, or reference secure data stored in Azure Key Vault. | 是Yes |
链接服务属性示例:Linked service properties example:
{
"name": "AzureDataExplorerLinkedService",
"properties": {
"type": "AzureDataExplorer",
"typeProperties": {
"endpoint": "https://<clusterName>.<regionName>.kusto.chinacloudapi.cn",
"database": "<database name>",
"tenant": "<tenant name/id e.g. microsoft.partner.onmschina.cn>",
"servicePrincipalId": "<service principal id>",
"servicePrincipalKey": {
"type": "SecureString",
"value": "<service principal key>"
}
}
}
}
数据集属性Dataset properties
有关可用于定义数据集的各个部分和属性的完整列表,请参阅 Azure 数据工厂中的数据集。For a full list of sections and properties available for defining datasets, see Datasets in Azure Data Factory. 本部分列出了 Azure 数据资源管理器数据集支持的属性。This section lists properties that the Azure Data Explorer dataset supports.
若要将数据复制到 Azure 数据资源管理器,请将数据集的 type 属性设置为 AzureDataExplorerTable。To copy data to Azure Data Explorer, set the type property of the dataset to AzureDataExplorerTable.
支持以下属性:The following properties are supported:
属性Property | 说明Description | 必须Required |
---|---|---|
typetype | type 属性必须设置为 AzureDataExplorerTable。The type property must be set to AzureDataExplorerTable. | 是Yes |
表table | 链接服务引用的表的名称。The name of the table that the linked service refers to. | 对于接收器为必需的,对于源不是必需的Yes for sink; No for source |
数据集属性示例:Dataset properties example:
{
"name": "AzureDataExplorerDataset",
"properties": {
"type": "AzureDataExplorerTable",
"typeProperties": {
"table": "<table name>"
},
"schema": [],
"linkedServiceName": {
"referenceName": "<Azure Data Explorer linked service name>",
"type": "LinkedServiceReference"
}
}
}
复制活动属性Copy activity properties
有关可用于定义活动的各部分和属性的完整列表,请参阅 Azure 数据工厂中的管道和活动。For a full list of sections and properties available for defining activities, see Pipelines and activities in Azure Data Factory. 本部分提供了 Azure 数据资源管理器源和接收器支持的属性列表。This section provides a list of properties that Azure Data Explorer sources and sinks support.
Azure 数据资源管理器作为源Azure Data Explorer as source
若要从 Azure 数据资源管理器复制数据,请将复制活动源中的 type 属性设置为 AzureDataExplorerSource。To copy data from Azure Data Explorer, set the type property in the Copy activity source to AzureDataExplorerSource. 复制活动source部分支持以下属性:The following properties are supported in the copy activity source section:
属性Property | 说明Description | 必须Required |
---|---|---|
typetype | 复制活动源的 type 属性必须设置为:AzureDataExplorerSourceThe type property of the copy activity source must be set to: AzureDataExplorerSource | 是Yes |
查询query | 以 KQL 格式指定的只读请求。A read-only request given in a KQL format. 使用自定义 KQL 查询作为参考。Use the custom KQL query as a reference. | 是Yes |
queryTimeoutqueryTimeout | 查询请求超时前的等待时间。默认值是 10 分钟 (00:10:00);允许的最大值是 1 小时 (01:00:00)。The wait time before the query request times out. Default value is 10 min (00:10:00); allowed max value is 1 hour (01:00:00). | 否No |
noTruncationnoTruncation | 指示是否截断返回的结果集。Indicates whether to truncate the returned result set. 默认情况下,结果在出现 500,000 条记录或达到 64 MB 之后将被截断。By default, result is truncated after 500,000 records or 64 megabytes (MB). 强烈建议将其截断,以确保活动的正确行为。Truncation is strongly recommended to ensure the correct behavior of the activity. | 否No |
备注
默认情况下,Azure 数据资源管理器源的大小限制为 500,000 条记录或 64 MB。By default, Azure Data Explorer source has a size limit of 500,000 records or 64 MB. 若要检索所有记录而不截断,可以在查询的开头指定 set notruncation;
。To retrieve all the records without truncation, you can specify set notruncation;
at the beginning of your query. 有关详细信息,请参阅查询限制。For more information, see Query limits.
示例:Example:
"activities":[
{
"name": "CopyFromAzureDataExplorer",
"type": "Copy",
"typeProperties": {
"source": {
"type": "AzureDataExplorerSource",
"query": "TestTable1 | take 10",
"queryTimeout": "00:10:00"
},
"sink": {
"type": "<sink type>"
}
},
"inputs": [
{
"referenceName": "<Azure Data Explorer input dataset name>",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "<output dataset name>",
"type": "DatasetReference"
}
]
}
]
Azure 数据资源管理器作为接收器Azure Data Explorer as sink
若要将数据复制到 Azure 数据资源管理器,请将复制活动接收器中的 type 属性设置为 AzureDataExplorerSink。To copy data to Azure Data Explorer, set the type property in the copy activity sink to AzureDataExplorerSink. 复制活动接收器部分中支持以下属性 :The following properties are supported in the copy activity sink section:
属性Property | 说明Description | 必须Required |
---|---|---|
typetype | 复制活动接收器的 type 属性必须设置为:AzureDataExplorerSink。The type property of the copy activity sink must be set to: AzureDataExplorerSink. | 是Yes |
ingestionMappingNameingestionMappingName | 基于 Kusto 表预先创建的映射的名称。Name of a pre-created mapping on a Kusto table. 若要将源中的列映射到 Azure 数据资源管理器(适用于所有支持的源存储和格式,包括 CSV/JSON/Avro 格式),可以使用复制活动列映射(按名称隐式映射或按配置显式映射)和/或 Azure 数据资源管理器映射。To map the columns from source to Azure Data Explorer (which applies to all supported source stores and formats, including CSV/JSON/Avro formats), you can use the copy activity column mapping (implicitly by name or explicitly as configured) and/or Azure Data Explorer mappings. | 否No |
additionalPropertiesadditionalProperties | 一个属性包,可用于指定 Azure 数据资源管理器接收器尚未设置的任何引入属性。A property bag which can be used for specifying any of the ingestion properties which aren't being set already by the Azure Data Explorer Sink. 具体来说,它可用于指定引入标记。Specifically, it can be useful for specifying ingestion tags. 从 Azure 数据资源管理器数据引入文档了解更多信息。Learn more from Azure Data Explore data ingestion doc. | 否No |
示例:Example:
"activities":[
{
"name": "CopyToAzureDataExplorer",
"type": "Copy",
"typeProperties": {
"source": {
"type": "<source type>"
},
"sink": {
"type": "AzureDataExplorerSink",
"ingestionMappingName": "<optional Azure Data Explorer mapping name>",
"additionalProperties": {<additional settings for data ingestion>}
}
},
"inputs": [
{
"referenceName": "<input dataset name>",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "<Azure Data Explorer output dataset name>",
"type": "DatasetReference"
}
]
}
]
查找活动属性Lookup activity properties
有关属性的详细信息,请参阅查找活动。For more information about the properties, see Lookup activity.
后续步骤Next steps
有关 Azure 数据工厂中复制活动支持用作源和接收器的数据存储的列表,请参阅支持的数据存储。For a list of data stores that the copy activity in Azure Data Factory supports as sources and sinks, see supported data stores.
详细了解如何将数据从 Azure 数据工厂复制到 Azure 数据资源管理器。Learn more about how to copy data from Azure Data Factory to Azure Data Explorer.