使用 Azure 数据工厂从 SharePoint Online 列表复制数据Copy data from SharePoint Online List by using Azure Data Factory

适用于: Azure 数据工厂 Azure Synapse Analytics(预览版)

本文概述如何使用 Azure 数据工厂中的复制活动从 SharePoint Online 列表复制数据。This article outlines how to use Copy Activity in Azure Data Factory to copy data from SharePoint Online List. 本文是根据总体概述复制活动的 Azure 数据工厂中的复制活动编写的。The article builds on Copy Activity in Azure Data Factory, which presents a general overview of Copy Activity.

支持的功能Supported capabilities

以下活动支持此 SharePoint Online 列表连接器:This SharePoint Online List connector is supported for the following activities:

可以将数据从 SharePoint Online 列表复制到任何受支持的接收器数据存储。You can copy data from SharePoint Online List to any supported sink data store. 有关复制活动支持作为源和接收器的数据存储的列表,请参阅支持的数据存储和格式For a list of data stores that Copy Activity supports as sources and sinks, see Supported data stores and formats.

具体而言,此 SharePoint Online 列表连接器使用服务主体身份验证,并通过 OData 协议检索数据。Specifically, this SharePoint List Online connector uses service principal authentication and retrieves data via OData protocol.

提示

此连接器支持从 SharePoint Online 列表而不是文件复制数据。This connector supports copying data from SharePoint Online List but not file. 通过从 SharePoint Online 复制文件部分了解如何复制文件。Learn how to copy file from Copy file from SharePoint Online section.

先决条件Prerequisites

SharePoint Online 列表连接器使用服务主体身份验证连接到 SharePoint。The SharePoint List Online connector uses service principal authentication to connect to SharePoint. 请按照以下步骤进行设置:Follow these steps to set it up:

  1. 遵循将应用程序注册到 Azure AD 租户,在 Azure Active Directory (Azure AD) 中注册一个应用程序实体。Register an application entity in Azure Active Directory (Azure AD) by following Register your application with an Azure AD tenant. 记下下面的值,这些值用于定义链接服务:Make note of the following values, which you use to define the linked service:

    • 应用程序 IDApplication ID
    • 应用程序密钥Application key
    • 租户 IDTenant ID
  2. 向您注册的应用程序授予 SharePoint Online 网站权限:Grant SharePoint Online site permission to your registered application:

    备注

    此操作需要 SharePoint Online 网站所有者权限。This operation requires SharePoint Online site owner permission. 可以通过如下步骤找到所有者:转到网站主页 -> 单击右角的“X 成员”-> 检查谁拥有“所有者”角色。You can find the owner by going to the site home page -> click the "X members" in the right corner -> check who has the "Owner" role.

    1. 打开 SharePoint Online 网站链接,例如 https://[your_site_url]/_layouts/15/appinv.aspx(替换网站 URL)。Open SharePoint Online site link e.g. https://[your_site_url]/_layouts/15/appinv.aspx (replace the site URL).

    2. 搜索已注册的应用程序 ID,填写空字段,然后单击“创建”。Search the application ID you registered, fill the empty fields, and click "Create".

      • 应用程序域:localhost.comApp Domain: localhost.com
      • 重定向 URL:https://www.localhost.comRedirect URL: https://www.localhost.com
      • 权限请求 XML:Permission Request XML:
      <AppPermissionRequests AllowAppOnlyPolicy="true">
          <AppPermissionRequest Scope="http://sharepoint/content/sitecollection/web" Right="Read"/>
      </AppPermissionRequests>
      

      SharePoint 授予权限

    3. 对此应用单击“信任它”。Click "Trust It" for this app.

入门Get started

若要使用管道执行复制活动,可以使用以下工具或 SDK 之一:To perform the Copy activity with a pipeline, you can use one of the following tools or SDKs:

对于特定于 SharePoint Online 列表连接器的数据工厂实体,以下部分提供了有关用于定义这些实体的属性的详细信息。The following sections provide details about properties you can use to define Data Factory entities that are specific to SharePoint Online List connector.

链接服务属性Linked service properties

SharePoint Online 列表链接的服务支持以下属性:The following properties are supported for an SharePoint Online List linked service:

属性Property 说明Description 必需Required
typetype type 属性必须设置为:SharePointOnlineList。 ****The type property must be set to: SharePointOnlineList. Yes
siteUrlsiteUrl SharePoint Online 网站 URL,例如 https://contoso.sharepoint.com/sites/siteNameThe SharePoint Online site url, e.g. https://contoso.sharepoint.com/sites/siteName. Yes
servicePrincipalIdservicePrincipalId 在 Azure Active Directory 中注册的应用程序的应用程序(客户端)ID。The Application (client) ID of the application registered in Azure Active Directory. Yes
servicePrincipalKeyservicePrincipalKey 应用程序的密钥。The application's key. 将此字段标记为 SecureString 以安全地将其存储在数据工厂中或引用存储在 Azure Key Vault 中的机密Mark this field as a SecureString to store it securely in Data Factory, or reference a secret stored in Azure Key Vault. Yes
tenantIdtenantId 应用程序所在的租户 ID。The tenant ID under which your application resides. Yes
connectViaconnectVia 用于连接到数据存储的 Integration RuntimeThe Integration Runtime to use to connect to the data store. 请参阅本文前面的先决条件,了解更多信息。Learn more from Prerequisites, earlier in this article. 如果未指定,则使用默认 Azure Integration Runtime。If not specified, the default Azure Integration Runtime is used. No

示例:Example:

{
    "name": "SharePointOnlineList",
    "properties": {
        "type": "SharePointOnlineList",
        "typeProperties": {
            "siteUrl": "<site URL>",
            "servicePrincipalId": "<service principal id>",
            "servicePrincipalKey": {
                "type": "SecureString",
                "value": "<service principal key>"
            },
            "tenantId": "<tenant ID>"
        }
    }
}

数据集属性Dataset properties

有关可用于定义数据集的各部分和属性的完整列表,请参阅数据集和链接服务For a full list of sections and properties that are available for defining datasets, see Datasets and linked services. 以下部分提供 SAP 表数据集支持的属性列表。The following section provides a list of the properties supported by the SAP table dataset.

propertiesProperty 说明Description 必选Required
typetype 数据集的 type 属性必须设置为 SharePointOnlineLResource。 The type property of the dataset must be set to SharePointOnlineLResource. Yes
listNamelistName SharePoint Online 列表的名称。The name of the SharePoint Online List. Yes

示例Example

{
    "name": "SharePointOnlineListDataset",
    "properties":
    {
        "type": "SharePointOnlineListResource",
        "linkedServiceName": {
            "referenceName": "<SharePoint Online List linked service name>",
            "type": "LinkedServiceReference"
        },
        "typeProperties":
        {
            "listName": "<name of the list>"
        }
    }
}

复制活动属性Copy Activity properties

有关可用于定义活动的各个部分和属性的完整列表,请参阅管道For a full list of sections and properties that are available for defining activities, see Pipelines. 以下部分提供 SharePoint Online 列表源支持的属性列表。The following section provides a list of the properties supported by the SharePoint Online List source.

SharePoint Online 列表作为源SharePoint Online List as source

若要从 SharePoint Online 列表复制数据,复制活动“源”部分支持以下属性:To copy data from SharePoint Online List, the following properties are supported in the Copy Activity source section:

propertiesProperty 说明Description 必选Required
typetype 复制活动源的 type 属性必须设置为 SharePointOnlineListSource。 The type property of the Copy Activity source must be set to SharePointOnlineListSource. Yes
queryquery 用于筛选数据的自定义 OData 查询选项。Custom OData query options for filtering data. 示例:"$top=10&$select=Title,Number"Example: "$top=10&$select=Title,Number". No
httpRequestTimeouthttpRequestTimeout 用于获取响应的 HTTP 请求的超时(以秒为单位)。The timeout (in second) for the HTTP request to get a response. 默认值为 300(5 分钟)。Default is 300 (5 minutes). No

示例Example

"activities":[
    {
        "name": "CopyFromSharePointOnlineList",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<SharePoint Online List input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "SharePointOnlineListSource",
                "query": "<OData query e.g. $top=10&$select=Title,Number>"
            }, 
            "sink": {
                "type": "<sink type>"
            }
        }
    }
]

SharePoint Online 列表的数据类型映射Data type mapping for SharePoint Online List

从 SharePoint Online 列表复制数据时,会在 SharePoint Online 列表数据类型和 Azure 数据工厂临时数据类型之间使用以下映射。When you copy data from SharePoint Online List, the following mappings are used between SharePoint Online List data types and Azure Data Factory interim data types.

SharePoint Online 数据类型SharePoint Online data type OData 数据类型OData data type Azure 数据工厂临时数据类型Azure Data Factory interim data type
单个文本行Single line of text Edm.StringEdm.String 字符串String
多行文本Multiple lines of text Edm.StringEdm.String 字符串String
选项(可供选择的菜单)Choice (menu to choose from) Edm.StringEdm.String 字符串String
数字(1、1.0、100)Number (1, 1.0, 100) Edm.DoubleEdm.Double DoubleDouble
货币($、¥、€)Currency ($, ¥, €) Edm.DoubleEdm.Double DoubleDouble
日期和时间Date and Time Edm.DateTimeEdm.DateTime DateTimeDateTime
查阅项(此网站已有的信息)Lookup (information already on this site) Edm.Int32Edm.Int32 Int32Int32
是/否(复选框)Yes/No (check box) Edm.BooleanEdm.Boolean BooleanBoolean
个人或团体Person or Group Edm.Int32Edm.Int32 Int32Int32
超链接或图片Hyperlink or Picture Edm.StringEdm.String 字符串String
计算(基于其他列的计算)Calculated (calculation based on other columns) Edm.String/Edm.Double/Edm.DateTime/Edm.BooleanEdm.String / Edm.Double / Edm.DateTime / Edm.Boolean 字符串/Double/DateTime/BooleanString / Double / DateTime / Boolean
附件Attachment 不支持Not supported
任务结果Task Outcome 不支持Not supported
外部数据External Data 不支持Not supported
托管元数据Managed Metadata 不支持Not supported

从 SharePoint Online 复制文件Copy file from SharePoint Online

您可以使用以下方法从 SharePoint Online 复制文件:使用 Web 活动进行身份验证并从 SPO 中获取访问令牌,然后传递到后续的复制活动以使用 HTTP 连接器作为源复制数据。 You can copy file from SharePoint Online by using Web activity to authenticate and grab access token from SPO, then passing to subsequent Copy activity to copy data with HTTP connector as source.

SharePoint 复制文件流

  1. 按照先决条件部分,创建 AAD 应用程序并向 SharePoint Online 授予权限。Follow the Prerequisites section to create AAD application and grant permission to SharePoint Online.

  2. 创建 Web 活动以从 SharePoint Online 获取访问令牌:Create a Web Activity to get the access token from SharePoint Online:

    • URL:https://accounts.accesscontrol.chinacloudapi.cn/[Tenant-ID]/tokens/OAuth/2URL: https://accounts.accesscontrol.chinacloudapi.cn/[Tenant-ID]/tokens/OAuth/2. 替换租户 ID。Replace the tenant ID.
    • 方法:POSTMethod: POST
    • 标头:Headers:
      • 内容类型:application/x-www-form-urlencodedContent-Type: application/x-www-form-urlencoded
    • 正文:grant_type=client_credentials&client_id=[Client-ID]@[Tenant-ID]&client_secret=[Client-Secret]&resource=00000003-0000-0ff1-ce00-000000000000/[Tenant-Name].sharepoint.com@[Tenant-ID]Body: grant_type=client_credentials&client_id=[Client-ID]@[Tenant-ID]&client_secret=[Client-Secret]&resource=00000003-0000-0ff1-ce00-000000000000/[Tenant-Name].sharepoint.com@[Tenant-ID]. 替换客户端 ID、客户端密码、租户 ID 和租户名称。Replace the client ID, client secret, tenant ID and tenant name.

    注意

    在 Web 活动中,将“安全输出”选项设置为 true,以防止令牌值以纯文本格式记录。Set the Secure Output option to true in Web activity to prevent the token value from being logged in plain text. 使用此值的任何其他活动都应将其“安全输入”选项设置为 true。Any further activities that consume this value should have their Secure Input option set to true.

  3. 与 HTTP 连接器作为源的复制活动链接以复制 SharePoint Online 文件内容:Chain with a Copy activity with HTTP connector as source to copy SharePoint Online file content:

    • HTTP 链接服务:HTTP linked service:
      • 基 URL:https://[site-url]/_api/web/GetFileByServerRelativeUrl('[relative-path-to-file]')/$valueBase URL: https://[site-url]/_api/web/GetFileByServerRelativeUrl('[relative-path-to-file]')/$value. 替换网站 URL 和文件的相对路径。Replace the site URL and relative path to file. 文件的相对路径示例为 /sites/site2/Shared Documents/TestBook.xlsxSample relative path to file as /sites/site2/Shared Documents/TestBook.xlsx.
      • 身份验证类型:匿名(稍后使用复制活动源中配置的持有者令牌)Authentication type: Anonymous (to use the Bearer token configured in copy activity source later)
    • 数据集:选择所需的格式。Dataset: choose the format you want. 若要按原样复制文件,请选择“二进制”类型。To copy file as-is, select "Binary" type.
    • 复制活动源:Copy activity source:
      • 请求方法:GETRequest method: GET
      • 其他标头:使用以下表达式@{concat('Authorization: Bearer ', activity('<Web-activity-name>').output.access_token)},该表达式将上游 Web 活动生成的持有者令牌用作授权标头。Additional header: use the following expression@{concat('Authorization: Bearer ', activity('<Web-activity-name>').output.access_token)}, which uses the Bearer token generated by the upstream Web activity as authorization header. 替换 Web 活动名称。Replace the Web activity name.
    • 照常配置复制活动接收器。Configure the copy activity sink as usual.

“查找”活动属性Lookup activity properties

若要详细了解这些属性,请查看“查找”活动To learn details about the properties, check Lookup activity.

后续步骤Next steps

有关 Azure 数据工厂中复制活动支持用作源和接收器的数据存储的列表,请参阅支持的数据存储和格式For a list of data stores that Copy Activity supports as sources and sinks in Azure Data Factory, see Supported data stores and formats.