使用 Azure 数据工厂将数据从 Office 365 复制到 AzureCopy data from Office 365 into Azure using Azure Data Factory

适用于: Azure 数据工厂 Azure Synapse Analytics

Azure 数据工厂与 Microsoft Graph 数据连接相集成,允许你以可缩放的方式将 Office 365 租户中的丰富组织数据引入 Azure,并生成分析应用程序和基于这些有价值的数据资产提取见解。Azure Data Factory integrates with Microsoft Graph data connect, allowing you to bring the rich organizational data in your Office 365 tenant into Azure in a scalable way and build analytics applications and extract insights based on these valuable data assets. 与 Privileged Access Management 的集成为 Office 365 中组织有序的有价值的数据提供安全访问控制。Integration with Privileged Access Management provides secured access control for the valuable curated data in Office 365. 有关 Microsoft Graph 数据连接的概述,请参阅此链接,有关许可信息,请参阅此链接Please refer to this link for an overview on Microsoft Graph data connect and refer to this link for licensing information.

本文概述了如何使用 Azure 数据工厂中的复制活动从 Office 365 复制数据。This article outlines how to use the Copy Activity in Azure Data Factory to copy data from Office 365. 它是基于概述复制活动总体的复制活动概述一文。It builds on the copy activity overview article that presents a general overview of copy activity.

支持的功能Supported capabilities

使用 ADF Office 365 连接器和 Microsoft Graph 数据连接可以从已启用 Exchange 电子邮件的邮箱中大规模地引入不同类型的数据集,包括通讯簿联系人、示例事件、电子邮件、用户信息和邮箱设置等。ADF Office 365 connector and Microsoft Graph data connect enables at scale ingestion of different types of datasets from Exchange Email enabled mailboxes, including address book contacts, calendar events, email messages, user information, mailbox settings, and so on. 请参阅此处以查看可用数据集的完整列表。Refer here to see the complete list of datasets available.

目前,在单个复制活动中,只能采用 JSON 格式(类型 setOfObjects)将数据从 Office 365 复制到 Azure Blob 存储Azure Data Lake Storage Gen2For now, within a single copy activity you can only copy data from Office 365 into Azure Blob Storage, and Azure Data Lake Storage Gen2 in JSON format (type setOfObjects). 如果要将 Office 365 加载到其他类型的或其他格式的数据存储,可以将第一个副本活动与后续复制活动链接在一起,以进一步将数据加载到任何支持的 ADF 目标存储(请参阅“支持的数据存储和格式”表中的“作为接收器支持”列)。If you want to load Office 365 into other types of data stores or in other formats, you can chain the first copy activity with a subsequent copy activity to further load data into any of the supported ADF destination stores (refer to "supported as a sink" column in the "Supported data stores and formats" table).

重要

  • 包含数据工厂和接收器数据存储的 Azure 订阅必须位于与 Office 365 租户相同的 Azure Active Directory (Azure AD) 租户下。The Azure subscription containing the data factory and the sink data store must be under the same Azure Active Directory (Azure AD) tenant as Office 365 tenant.
  • 确保用于复制活动的 Azure Integration Runtime 区域以及目标在 Office 365 租户用户邮箱所在的同一区域中。Ensure the Azure Integration Runtime region used for copy activity as well as the destination is in the same region where the Office 365 tenant users' mailbox is located. 若要了解如何确定 Azure IR 位置,请参阅此处Refer here to understand how the Azure IR location is determined. 有关受支持的 Office 区域和对应的 Azure 区域列表,请参阅此处的表Refer to table here for the list of supported Office regions and corresponding Azure regions.
  • 服务主体身份验证是 Azure Blob 存储、Azure Data Lake Storage Gen2 作为目标存储时唯一支持的身份验证机制。Service Principal authentication is the only authentication mechanism supported for Azure Blob Storage, Azure Data Lake Storage Gen2 as destination stores.

先决条件Prerequisites

若要将数据从 Office 365 复制到 Azure,需要完成下列必备步骤:To copy data from Office 365 into Azure, you need to complete the following prerequisite steps:

  • Office 365 租户管理员必须完成载入操作,如此处所述。Your Office 365 tenant admin must complete on-boarding actions as described here.
  • 在 Azure Active Directory 中创建和配置 Azure AD Web 应用程序。Create and configure an Azure AD web application in Azure Active Directory. 有关说明,请参阅创建 Azure AD 应用程序For instructions, see Create an Azure AD application.
  • 记下下面的值,这些值将用于定义 Office 365 的链接服务:Make note of the following values, which you will use to define the linked service for Office 365:
  • 添加用户标识,将作为 Azure AD Web 应用程序的所有者发出数据访问请求(从 Azure AD Web 应用程序>设置>所有者>添加所有者)。Add the user identity who will be making the data access request as the owner of the Azure AD web application (from the Azure AD web application > Settings > Owners > Add owner).
    • 用户标识必须位于你从中获取数据的 Office 365 组织中,并且不能是来宾用户。The user identity must be in the Office 365 organization you are getting data from and must not be a Guest user.

批准新的数据访问请求Approving new data access requests

如果这是你首次请求此上下文(要访问的数据表、要将数据加载到的目标帐户和发出数据访问请求的用户标识的组合)的数据,则复制活动状态将显示为“正在进行”;仅当单击“操作”下的“详细信息”链接时,状态才显示为“正在请求许可”。If this is the first time you are requesting data for this context (a combination of which data table is being access, which destination account is the data being loaded into, and which user identity is making the data access request), you will see the copy activity status as "In Progress", and only when you click into "Details" link under Actions will you see the status as “RequestingConsent”. 在继续执行数据提取之前,数据访问审批者组的成员需要在 Privileged Access Management 中审批该请求。A member of the data access approver group needs to approve the request in the Privileged Access Management before the data extraction can proceed.

有关审批者如何批准数据访问请求的信息,请参阅此处,以及有关与 Privileged Access Management 的全面集成(包括如何设置数据访问审批者组)的说明,请参阅此处Refer here on how the approver can approve the data access request, and refer here for an explanation on the overall integration with Privileged Access Management, including how to set up the data access approver group.

策略验证Policy validation

如果 ADF 作为托管应用程序的一部分创建,并且 Azure 策略分配在管理资源组的资源上进行,那么对于运行的每个副本活动,ADF 都将检查以确保强制实施策略分配。If ADF is created as part of a managed app and Azure policies assignments are made on resources within the management resource group, then for every copy activity run, ADF will check to make sure the policy assignments are enforced. 有关受支持的策略列表,请参阅此处Refer here for a list of supported policies.

入门Getting started

提示

有关使用 Office 365 连接器的演练,请参阅从 Office 365 加载数据一文。For a walkthrough of using Office 365 connector, see Load data from Office 365 article.

可以使用以下工具或 SDK 之一创建包含复制活动的管道。You can create a pipeline with the copy activity by using one of the following tools or SDKs. 选择链接导航到相关教程,其中涵盖有关创建包含复制活动的管道的分步说明。Select a link to go to a tutorial with step-by-step instructions to create a pipeline with a copy activity.

对于特定于 Office 365 连接器的数据工厂实体,以下部分提供有关用于定义这些实体的属性的详细信息。The following sections provide details about properties that are used to define Data Factory entities specific to Office 365 connector.

链接服务属性Linked service properties

Office 365 链接服务支持以下属性:The following properties are supported for Office 365 linked service:

属性Property 说明Description 必须Required
typetype type 属性必须设置为:Office365The type property must be set to: Office365 Yes
office365TenantIdoffice365TenantId Office 365 帐户所属的 Azure 租户 ID。Azure tenant ID to which the Office 365 account belongs. Yes
servicePrincipalTenantIdservicePrincipalTenantId 指定 Azure AD Web 应用程序所在的租户信息。Specify the tenant information under which your Azure AD web application resides. Yes
servicePrincipalIdservicePrincipalId 指定应用程序的客户端 ID。Specify the application's client ID. Yes
servicePrincipalKeyservicePrincipalKey 指定应用程序的密钥。Specify the application's key. 将此字段标记为 SecureString,以便安全地将其存储在数据工厂中。Mark this field as a SecureString to store it securely in Data Factory. Yes
connectViaconnectVia 用于连接到数据存储的 Integration Runtime。The Integration Runtime to be used to connect to the data store. 如果未指定,则使用默认 Azure Integration Runtime。If not specified, it uses the default Azure Integration Runtime. No

备注

office365TenantId 和 servicePrincipalTenantId 之间的差异和提供的相应值:The difference between office365TenantId and servicePrincipalTenantId and the corresponding value to provide:

  • 如果你是一名企业开发人员,开发便于自己组织使用的针对 Office 365 数据的应用程序,则应该为这两个属性提供相同的租户 ID,即你的组织 AAD 租户 ID。If you are an enterprise developer developing an application against Office 365 data for your own organization's usage, then you should supply the same tenant ID for both properties, which is your organization's AAD tenant ID.
  • 如果你是为客户开发应用程序的 ISV 开发人员,那么 office365TenantId 将是客户的(应用程序安装程序)AAD 租户 ID,servicePrincipalTenantId 则为公司的 AAD 租户 ID。If you are an ISV developer developing an application for your customers, then office365TenantId will be your customer’s (application installer) AAD tenant ID and servicePrincipalTenantId will be your company’s AAD tenant ID.

示例:Example:

{
    "name": "Office365LinkedService",
    "properties": {
        "type": "Office365",
        "typeProperties": {
            "office365TenantId": "<Office 365 tenant id>",
            "servicePrincipalTenantId": "<AAD app service principal tenant id>",
            "servicePrincipalId": "<AAD app service principal id>",
            "servicePrincipalKey": {
                "type": "SecureString",
                "value": "<AAD app service principal key>"
            }
        }
    }
}

数据集属性Dataset properties

有关可用于定义数据集的各部分和属性的完整列表,请参阅数据集一文。For a full list of sections and properties available for defining datasets, see the datasets article. 本部分提供 Office 365 数据集支持的属性列表。This section provides a list of properties supported by Office 365 dataset.

若要从 Office 365 复制数据,支持以下属性:To copy data from Office 365, the following properties are supported:

属性Property 说明Description 必须Required
typetype 数据集的 type 属性必须设置为:Office365TableThe type property of the dataset must be set to: Office365Table Yes
tableNametableName 要从 Office 365 中提取的数据集的名称。Name of the dataset to extract from Office 365. 有关支持提取的 Office 365 数据集列表,请参阅此处Refer here for the list of Office 365 datasets available for extraction. Yes

如果在数据集中设置了 dateFilterColumnstartTimeendTimeuserScopeFilterUri,则仍按原样支持该数据集,但建议你以后在活动源中使用新模型。If you were setting dateFilterColumn, startTime, endTime, and userScopeFilterUri in dataset, it is still supported as-is, while you are suggested to use the new model in activity source going forward.

示例Example

{
    "name": "DS_May2019_O365_Message",
    "properties": {
        "type": "Office365Table",
        "linkedServiceName": {
            "referenceName": "<Office 365 linked service name>",
            "type": "LinkedServiceReference"
        },
        "schema": [],
        "typeProperties": {
            "tableName": "BasicDataSet_v0.Event_v1"
        }
    }
}

复制活动属性Copy activity properties

有关可用于定义活动的各部分和属性的完整列表,请参阅管道一文。For a full list of sections and properties available for defining activities, see the Pipelines article. 本部分提供 Office 365 源支持的属性列表。This section provides a list of properties supported by Office 365 source.

Office 365 即源Office 365 as source

为了从 Office 365 复制数据,复制活动的 source 节支持以下属性:To copy data from Office 365, the following properties are supported in the copy activity source section:

属性Property 说明Description 必须Required
typetype 复制活动 source 的 type 属性必须设置为:Office365SourceThe type property of the copy activity source must be set to: Office365Source Yes
allowedGroupsallowedGroups 组选择谓词。Group selection predicate. 可以使用此属性选择最多 10 个将为其检索数据的用户组。Use this property to select up to 10 user groups for whom the data will be retrieved. 如果未指定任何组,则会为整个组织返回数据。If no groups are specified, then data will be returned for the entire organization. No
userScopeFilterUriuserScopeFilterUri 未指定 allowedGroups 属性时,可以使用在整个租户上应用的谓词表达式来筛选要从 Office 365 中提取的特定行。When allowedGroups property is not specified, you can use a predicate expression that is applied on the entire tenant to filter the specific rows to extract from Office 365. 谓词格式应当与 Microsoft Graph API 的查询格式匹配,例如 https://graph.microsoft.com/v1.0/users?$filter=Department eq 'Finance'The predicate format should match the query format of Microsoft Graph APIs, e.g. https://graph.microsoft.com/v1.0/users?$filter=Department eq 'Finance'. No
dateFilterColumndateFilterColumn 日期/时间筛选器列的名称。Name of the DateTime filter column. 可以使用此属性限制要提取 Office 365 数据的时间范围。Use this property to limit the time range for which Office 365 data is extracted. 如果数据集有一个或多个日期/时间列,则为必需的。Yes if dataset has one or more DateTime columns. 有关需要此日期/时间筛选器的数据集的列表,请参阅此处Refer here for list of datasets that require this DateTime filter.
startTimestartTime 筛选时所依据的开始日期/时间值。Start DateTime value to filter on. 如果指定了 dateFilterColumn,则为必需的Yes if dateFilterColumn is specified
endTimeendTime 筛选时所依据的结束日期/时间值。End DateTime value to filter on. 如果指定了 dateFilterColumn,则为必需的Yes if dateFilterColumn is specified
outputColumnsoutputColumns 要复制到接收器的列的数组。Array of the columns to copy to sink. No

示例:Example:

"activities": [
    {
        "name": "CopyFromO365ToBlob",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<Office 365 input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "Office365Source",
                "dateFilterColumn": "CreatedDateTime",
                "startTime": "2019-04-28T16:00:00.000Z",
                "endTime": "2019-05-05T16:00:00.000Z",
                "userScopeFilterUri": "https://graph.microsoft.com/v1.0/users?$filter=Department eq 'Finance'",
                "outputColumns": [
                    {
                        "name": "Id"
                    },
                    {
                        "name": "CreatedDateTime"
                    },
                    {
                        "name": "LastModifiedDateTime"
                    },
                    {
                        "name": "ChangeKey"
                    },
                    {
                        "name": "Categories"
                    },
                    {
                        "name": "OriginalStartTimeZone"
                    },
                    {
                        "name": "OriginalEndTimeZone"
                    },
                    {
                        "name": "ResponseStatus"
                    },
                    {
                        "name": "iCalUId"
                    },
                    {
                        "name": "ReminderMinutesBeforeStart"
                    },
                    {
                        "name": "IsReminderOn"
                    },
                    {
                        "name": "HasAttachments"
                    },
                    {
                        "name": "Subject"
                    },
                    {
                        "name": "Body"
                    },
                    {
                        "name": "Importance"
                    },
                    {
                        "name": "Sensitivity"
                    },
                    {
                        "name": "Start"
                    },
                    {
                        "name": "End"
                    },
                    {
                        "name": "Location"
                    },
                    {
                        "name": "IsAllDay"
                    },
                    {
                        "name": "IsCancelled"
                    },
                    {
                        "name": "IsOrganizer"
                    },
                    {
                        "name": "Recurrence"
                    },
                    {
                        "name": "ResponseRequested"
                    },
                    {
                        "name": "ShowAs"
                    },
                    {
                        "name": "Type"
                    },
                    {
                        "name": "Attendees"
                    },
                    {
                        "name": "Organizer"
                    },
                    {
                        "name": "WebLink"
                    },
                    {
                        "name": "Attachments"
                    },
                    {
                        "name": "BodyPreview"
                    },
                    {
                        "name": "Locations"
                    },
                    {
                        "name": "OnlineMeetingUrl"
                    },
                    {
                        "name": "OriginalStart"
                    },
                    {
                        "name": "SeriesMasterId"
                    }
                ]
            },
            "sink": {
                "type": "BlobSink"
            }
        }
    }
]

后续步骤Next steps

有关 Azure 数据工厂中复制活动支持作为源和接收器的数据存储的列表,请参阅支持的数据存储For a list of data stores supported as sources and sinks by the copy activity in Azure Data Factory, see supported data stores.