使用 Azure 数据工厂从 MongoDB 复制数据Copy data from MongoDB using Azure Data Factory

适用于: Azure 数据工厂 Azure Synapse Analytics

本文概述了如何使用 Azure 数据工厂中的复制活动从 MongoDB 数据库复制数据。This article outlines how to use the Copy Activity in Azure Data Factory to copy data from a MongoDB database. 它是基于概述复制活动总体的复制活动概述一文。It builds on the copy activity overview article that presents a general overview of copy activity.

重要

ADF 发布了这个新版本的 MongoDB 连接器,它提供更好的本机 MongoDB 支持。ADF release this new version of MongoDB connector which provides better native MongoDB support. 如果在解决方案中使用的是以前的 MongoDB 连接器,且该连接器“按原样”支持后向兼容性,请参阅 MongoDB 连接器(旧版)一文。If you are using the previous MongoDB connector in your solution which is supported as-is for backward compatibility, refer to MongoDB connector (legacy) article.

支持的功能Supported capabilities

可以将数据从 MongoDB 数据库复制到任何支持的接收器数据存储。You can copy data from MongoDB database to any supported sink data store. 有关复制活动支持作为源/接收器的数据存储列表,请参阅支持的数据存储表。For a list of data stores that are supported as sources/sinks by the copy activity, see the Supported data stores table.

具体而言,此 MongoDB 连接器最高支持版本 4.2。Specifically, this MongoDB connector supports versions up to 4.2.

先决条件Prerequisites

如果数据存储位于本地网络、Azure 虚拟网络或 Amazon Virtual Private Cloud 内部,则需要配置自承载集成运行时才能连接到该数据存储。If your data store is located inside an on-premises network, an Azure virtual network, or Amazon Virtual Private Cloud, you need to configure a self-hosted integration runtime to connect to it.

另外,如果数据存储是托管的云数据服务,可以使用 Azure 集成运行时。Alternatively, if your data store is a managed cloud data service, you can use Azure integration runtime. 如果访问范围限制为防火墙规则中允许的 IP,你可以选择将 Azure Integration Runtime IP 添加到允许列表。If the access is restricted to IPs that are approved in the firewall rules, you can add Azure Integration Runtime IPs into the allow list.

要详细了解网络安全机制和数据工厂支持的选项,请参阅数据访问策略For more information about the network security mechanisms and options supported by Data Factory, see Data access strategies.

入门Getting started

若要使用管道执行复制活动,可以使用以下工具或 SDK 之一:To perform the Copy activity with a pipeline, you can use one of the following tools or SDKs:

对于特定于 MongoDB 连接器的数据工厂实体,以下部分提供有关用于定义这些实体的属性的详细信息。The following sections provide details about properties that are used to define Data Factory entities specific to MongoDB connector.

链接服务属性Linked service properties

MongoDB 链接的服务支持以下属性:The following properties are supported for MongoDB linked service:

属性Property 说明Description 必需Required
typetype type 属性必须设置为:MongoDbV2 The type property must be set to: MongoDbV2 Yes
connectionStringconnectionString 指定 MongoDB 连接字符串,例如 mongodb://[username:password@]host[:port][/[database][?options]]Specify the MongoDB connection string e.g. mongodb://[username:password@]host[:port][/[database][?options]]. 请参阅 MongoDB 连接字符串手册获取详细信息。Refer to MongoDB manual on connection string for more details.

还可以将连接字符串置于 Azure Key Vault 中。You can also put a connection string in Azure Key Vault. 有关更多详细信息,请参阅在 Azure Key Vault 中存储凭据Refer to Store credentials in Azure Key Vault with more details.
Yes
databasedatabase 要访问的数据库的名称。Name of the database that you want to access. Yes
connectViaconnectVia 用于连接到数据存储的集成运行时The Integration Runtime to be used to connect to the data store. 先决条件部分了解更多信息。Learn more from Prerequisites section. 如果未指定,则使用默认 Azure Integration Runtime。If not specified, it uses the default Azure Integration Runtime. No

示例:Example:

{
    "name": "MongoDBLinkedService",
    "properties": {
        "type": "MongoDbV2",
        "typeProperties": {
            "connectionString": "mongodb://[username:password@]host[:port][/[database][?options]]",
            "database": "myDatabase"
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

数据集属性Dataset properties

有关可用于定义数据集的各部分和属性的完整列表,请参阅数据集和链接服务For a full list of sections and properties that are available for defining datasets, see Datasets and linked services. MongoDB 数据集支持以下属性:The following properties are supported for MongoDB dataset:

属性Property 说明Description 必需Required
typetype 数据集的 type 属性必须设置为:MongoDbV2Collection The type property of the dataset must be set to: MongoDbV2Collection Yes
collectionNamecollectionName MongoDB 数据库中集合的名称。Name of the collection in MongoDB database. Yes

示例:Example:

{
    "name": "MongoDbDataset",
    "properties": {
        "type": "MongoDbV2Collection",
        "typeProperties": {
            "collectionName": "<Collection name>"
        },
        "schema": [],
        "linkedServiceName": {
            "referenceName": "<MongoDB linked service name>",
            "type": "LinkedServiceReference"
        }
    }
}

复制活动属性Copy activity properties

有关可用于定义活动的各部分和属性的完整列表,请参阅管道一文。For a full list of sections and properties available for defining activities, see the Pipelines article. 本部分提供 MongoDB 源支持的属性列表。This section provides a list of properties supported by MongoDB source.

以 MongoDB 作为源MongoDB as source

复制活动 source 部分支持以下属性:The following properties are supported in the copy activity source section:

属性Property 说明Description 必需Required
typetype 复制活动 source 的 type 属性必须设置为:MongoDbV2Source The type property of the copy activity source must be set to: MongoDbV2Source Yes
filterfilter 使用查询运算符指定选择筛选器。Specifies selection filter using query operators. 若要返回集合中的所有文档,请省略此参数或传递空文档 ({})。To return all documents in a collection, omit this parameter or pass an empty document ({}). No
cursorMethods.projectcursorMethods.project 指定要在文档中返回用于投影的字段。Specifies the fields to return in the documents for projection. 若要返回匹配文档中的所有字段,请省略此参数。To return all fields in the matching documents, omit this parameter. No
cursorMethods.sortcursorMethods.sort 指定查询返回匹配文档的顺序。Specifies the order in which the query returns matching documents. 请参阅 cursor.sort()Refer to cursor.sort(). No
cursorMethods.limitcursorMethods.limit 指定服务器返回的文档的最大数量。Specifies the maximum number of documents the server returns. 请参阅 cursor.limit()Refer to cursor.limit(). No
cursorMethods.skipcursorMethods.skip 指定要跳过的文档数量以及 MongoDB 开始返回结果的位置。Specifies the number of documents to skip and from where MongoDB begins to return results. 请参阅 cursor.skip()Refer to cursor.skip(). No
batchSizebatchSize 指定从 MongoDB 实例的每批响应中返回的文档数量。Specifies the number of documents to return in each batch of the response from MongoDB instance. 大多数情况下,修改批大小不会影响用户或应用程序。In most cases, modifying the batch size will not affect the user or the application. Cosmos DB 限制每个批不能超过 40 MB(这是文档大小的 batchSize 数量的总和),因此如果文档很大,请减小此值。Cosmos DB limits each batch cannot exceed 40 MB in size, which is the sum of the batchSize number of documents' size, so decrease this value if your document size being large. No
(默认值为 100(the default is 100)

提示

ADF 支持在 严格模式 下使用 BSON 文档。ADF support consuming BSON document in Strict mode. 请确保筛选器查询处于严格模式,而不是 Shell 模式。Make sure your filter query is in Strict mode instead of Shell mode. 有关详细说明,请参阅 MongoDB 手册More description can be found at MongoDB manual.

示例:Example:

"activities":[
    {
        "name": "CopyFromMongoDB",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<MongoDB input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "MongoDbV2Source",
                "filter": "{datetimeData: {$gte: ISODate(\"2018-12-11T00:00:00.000Z\"),$lt: ISODate(\"2018-12-12T00:00:00.000Z\")}, _id: ObjectId(\"5acd7c3d0000000000000000\") }",
                "cursorMethods": {
                    "project": "{ _id : 1, name : 1, age: 1, datetimeData: 1 }",
                    "sort": "{ age : 1 }",
                    "skip": 3,
                    "limit": 3
                }
            },
            "sink": {
                "type": "<sink type>"
            }
        }
    }
]

按原样导出 JSON 文档Export JSON documents as-is

可以使用此 MongoDB 连接器将 JSON 文档按原样从 MongoDB 集合导出到各种基于文件的存储或 Azure Cosmos DB。You can use this MongoDB connector to export JSON documents as-is from a MongoDB collection to various file-based stores or to Azure Cosmos DB. 若要实现这种架构不可知的复制,请跳过数据集中的“结构”(也称为“架构”)节和复制活动中的架构映射 。To achieve such schema-agnostic copy, skip the "structure" (also called schema) section in dataset and schema mapping in copy activity.

架构映射Schema mapping

若要将数据从 MongoDB 复制到表格接收器,请参阅架构映射To copy data from MongoDB to tabular sink, refer to schema mapping.

后续步骤Next steps

有关 Azure 数据工厂中复制活动支持作为源和接收器的数据存储的列表,请参阅支持的数据存储For a list of data stores supported as sources and sinks by the copy activity in Azure Data Factory, see supported data stores.