使用 Azure 数据工厂从 MongoDB 复制数据Copy data from MongoDB using Azure Data Factory

适用于:是 Azure 数据工厂是 Azure Synapse Analytics(预览版)APPLIES TO: yesAzure Data Factory yesAzure Synapse Analytics (Preview)

本文概述了如何使用 Azure 数据工厂中的复制活动从 MongoDB 数据库复制数据。This article outlines how to use the Copy Activity in Azure Data Factory to copy data from a MongoDB database. 它是基于概述复制活动总体的复制活动概述一文。It builds on the copy activity overview article that presents a general overview of copy activity.

重要

ADF 发布了这个新版本的 MongoDB 连接器,它提供更好的本机 MongoDB 支持。ADF release this new version of MongoDB connector which provides better native MongoDB support. 如果在解决方案中使用的是以前的 MongoDB 连接器,且该连接器“按原样”支持后向兼容性,请参阅 MongoDB 连接器(旧版)一文。If you are using the previous MongoDB connector in your solution which is supported as-is for backward compatibility, refer to MongoDB connector (legacy) article.

支持的功能Supported capabilities

可以将数据从 MongoDB 数据库复制到任何支持的接收器数据存储。You can copy data from MongoDB database to any supported sink data store. 有关复制活动支持作为源/接收器的数据存储列表,请参阅支持的数据存储表。For a list of data stores that are supported as sources/sinks by the copy activity, see the Supported data stores table.

具体而言,此 MongoDB 连接器最高支持版本 3.4 。Specifically, this MongoDB connector supports versions up to 3.4.

先决条件Prerequisites

如果数据存储是以下方式之一配置的,则需要设置自承载集成运行时才能连接到此数据存储:If your data store is configured in one of the following ways, you need to set up a self-hosted integration runtime to connect to the data store:

  • 数据存储位于本地网络内部、Azure 虚拟网络内部或 Amazon 虚拟私有云内。The data store is located inside an on-premises network, inside an Azure virtual network, or inside Amazon Virtual Private Cloud.
  • 数据存储是一种托管的云数据服务,只有在防火墙规则中列入允许列表的 IP 才能访问该服务。The data store is a managed cloud data service where the access is restricted to IPs that are whitelisted in the firewall rules.

入门Getting started

若要使用管道执行复制活动,可以使用以下工具或 SDK 之一:To perform the Copy activity with a pipeline, you can use one of the following tools or SDKs:

对于特定于 MongoDB 连接器的数据工厂实体,以下部分提供有关用于定义这些实体的属性的详细信息。The following sections provide details about properties that are used to define Data Factory entities specific to MongoDB connector.

链接服务属性Linked service properties

MongoDB 链接的服务支持以下属性:The following properties are supported for MongoDB linked service:

属性Property 说明Description 必须Required
typetype type 属性必须设置为:MongoDbV2 The type property must be set to: MongoDbV2 Yes
connectionStringconnectionString 指定 MongoDB 连接字符串,例如 mongodb://[username:password@]host[:port][/[database][?options]]Specify the MongoDB connection string e.g. mongodb://[username:password@]host[:port][/[database][?options]]. 请参阅 MongoDB 连接字符串手册获取详细信息。Refer to MongoDB manual on connection string for more details.

还可以将密码放在 Azure 密钥保管库中,并从连接字符串中拉取 password 配置。You can also put a password in Azure Key Vault and pull the password configuration out of the connection string. 有关更多详细信息,请参阅在 Azure Key Vault 中存储凭据Refer to Store credentials in Azure Key Vault with more details.
Yes
databasedatabase 要访问的数据库的名称。Name of the database that you want to access. Yes
connectViaconnectVia 用于连接到数据存储的集成运行时The Integration Runtime to be used to connect to the data store. 先决条件部分了解更多信息。Learn more from Prerequisites section. 如果未指定,则使用默认 Azure Integration Runtime。If not specified, it uses the default Azure Integration Runtime. No

示例:Example:

{
    "name": "MongoDBLinkedService",
    "properties": {
        "type": "MongoDbV2",
        "typeProperties": {
            "connectionString": "mongodb://[username:password@]host[:port][/[database][?options]]",
            "database": "myDatabase"
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

数据集属性Dataset properties

有关可用于定义数据集的各部分和属性的完整列表,请参阅数据集和链接服务For a full list of sections and properties that are available for defining datasets, see Datasets and linked services. MongoDB 数据集支持以下属性:The following properties are supported for MongoDB dataset:

属性Property 说明Description 必须Required
typetype 数据集的 type 属性必须设置为:MongoDbV2Collection The type property of the dataset must be set to: MongoDbV2Collection Yes
collectionNamecollectionName MongoDB 数据库中集合的名称。Name of the collection in MongoDB database. Yes

示例:Example:

{
    "name": "MongoDbDataset",
    "properties": {
        "type": "MongoDbV2Collection",
        "typeProperties": {
            "collectionName": "<Collection name>"
        },
        "schema": [],
        "linkedServiceName": {
            "referenceName": "<MongoDB linked service name>",
            "type": "LinkedServiceReference"
        }
    }
}

复制活动属性Copy activity properties

有关可用于定义活动的各部分和属性的完整列表,请参阅管道一文。For a full list of sections and properties available for defining activities, see the Pipelines article. 本部分提供 MongoDB 源支持的属性列表。This section provides a list of properties supported by MongoDB source.

以 MongoDB 作为源MongoDB as source

复制活动source部分支持以下属性:The following properties are supported in the copy activity source section:

属性Property 说明Description 必须Required
typetype 复制活动 source 的 type 属性必须设置为:MongoDbV2Source The type property of the copy activity source must be set to: MongoDbV2Source Yes
filterfilter 使用查询运算符指定选择筛选器。Specifies selection filter using query operators. 若要返回集合中的所有文档,请省略此参数或传递空文档 ({})。To return all documents in a collection, omit this parameter or pass an empty document ({}). No
cursorMethods.projectcursorMethods.project 指定要在文档中返回用于投影的字段。Specifies the fields to return in the documents for projection. 若要返回匹配文档中的所有字段,请省略此参数。To return all fields in the matching documents, omit this parameter. No
cursorMethods.sortcursorMethods.sort 指定查询返回匹配文档的顺序。Specifies the order in which the query returns matching documents. 请参阅 cursor.sort()Refer to cursor.sort(). No
cursorMethods.limitcursorMethods.limit 指定服务器返回的文档的最大数量。Specifies the maximum number of documents the server returns. 请参阅 cursor.limit()Refer to cursor.limit(). No
cursorMethods.skipcursorMethods.skip 指定要跳过的文档数量以及 MongoDB 开始返回结果的位置。Specifies the number of documents to skip and from where MongoDB begins to return results. 请参阅 cursor.skip()Refer to cursor.skip(). No
batchSizebatchSize 指定从 MongoDB 实例的每批响应中返回的文档数量。Specifies the number of documents to return in each batch of the response from MongoDB instance. 大多数情况下,修改批大小不会影响用户或应用程序。In most cases, modifying the batch size will not affect the user or the application. Cosmos DB 限制每个批不能超过 40 MB(这是文档大小的 batchSize 数量的总和),因此如果文档很大,请减小此值。Cosmos DB limits each batch cannot exceed 40MB in size, which is the sum of the batchSize number of documents' size, so decrease this value if your document size being large. No
(默认值为 100(the default is 100)

提示

ADF 支持在严格模式下使用 BSON 文档。ADF support consuming BSON document in Strict mode. 请确保筛选器查询处于严格模式,而不是 Shell 模式。Make sure your filter query is in Strict mode instead of Shell mode. 有关详细说明,请参阅 MongoDB 手册More description can be found at MongoDB manual.

示例:Example:

"activities":[
    {
        "name": "CopyFromMongoDB",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<MongoDB input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "MongoDbV2Source",
                "filter": "{datetimeData: {$gte: ISODate(\"2018-12-11T00:00:00.000Z\"),$lt: ISODate(\"2018-12-12T00:00:00.000Z\")}, _id: ObjectId(\"5acd7c3d0000000000000000\") }",
                "cursorMethods": {
                    "project": "{ _id : 1, name : 1, age: 1, datetimeData: 1 }",
                    "sort": "{ age : 1 }",
                    "skip": 3,
                    "limit": 3
                }
            },
            "sink": {
                "type": "<sink type>"
            }
        }
    }
]

按原样导出 JSON 文档Export JSON documents as-is

可以使用此 MongoDB 连接器将 JSON 文档按原样从 MongoDB 集合导出到各种基于文件的存储或 Azure Cosmos DB。You can use this MongoDB connector to export JSON documents as-is from a MongoDB collection to various file-based stores or to Azure Cosmos DB. 若要实现这种架构不可知的复制,请跳过数据集中的“结构”(也称为“架构”)节和复制活动中的架构映射 。To achieve such schema-agnostic copy, skip the "structure" (also called schema) section in dataset and schema mapping in copy activity.

架构映射Schema mapping

若要将数据从 MongoDB 复制到表格接收器,请参阅架构映射To copy data from MongoDB to tabular sink, refer to schema mapping.

后续步骤Next steps

有关 Azure 数据工厂中复制活动支持作为源和接收器的数据存储的列表,请参阅支持的数据存储For a list of data stores supported as sources and sinks by the copy activity in Azure Data Factory, see supported data stores.