将数据从 MongoDB 迁移到 Azure Cosmos DB's API for MongoDB 的迁移前步骤Pre-migration steps for data migrations from MongoDB to Azure Cosmos DB's API for MongoDB

适用于: Azure Cosmos DB API for MongoDB

在将数据从 MongoDB(本地或云中)迁移到 Azure Cosmos DB's API for MongoDB 之前,应执行以下操作:Before you migrate your data from MongoDB (either on-premises or in the cloud) to Azure Cosmos DB's API for MongoDB, you should:

  1. 阅读有关使用 Azure Cosmos DB’s API for MongoDB 的重要注意事项Read the key considerations about using Azure Cosmos DB's API for MongoDB
  2. 选择迁移数据的选项Choose an option to migrate your data
  3. 估算工作负荷所需的吞吐量Estimate the throughput needed for your workloads
  4. 为数据选择最佳的分区键Pick an optimal partition key for your data
  5. 了解可对数据设置的索引策略Understand the indexing policy that you can set on your data

如果已完成上述迁移先决条件,可以使用 Azure 数据库迁移服务将 MongoDB 数据迁移到 Azure Cosmos DB’s API for MongoDBIf you have already completed the above pre-requisites for migration, you can Migrate MongoDB data to Azure Cosmos DB's API for MongoDB using the Azure Database Migration Service. 此外,如果尚未创建帐户,可以浏览任何介绍了帐户创建步骤的快速入门Additionally, if you haven't created an account, you can browse any of the Quickstarts that show the steps to create an account.

使用 Azure Cosmos DB’s API for MongoDB 时的注意事项Considerations when using Azure Cosmos DB's API for MongoDB

下面是有关 Azure Cosmos DB’s API for MongoDB 的具体特征:The following are specific characteristics about Azure Cosmos DB's API for MongoDB:

  • 容量模型:Azure Cosmos DB 上的数据库容量基于吞吐量模型。Capacity model: Database capacity on Azure Cosmos DB is based on a throughput-based model. 此模型基于每秒请求单位数,此单位表示每秒可对集合执行的数据库操作次数。This model is based on Request Units per second, which is a unit that represents the number of database operations that can be executed against a collection on a per-second basis. 可以在数据库或集合级别分配此容量,也可以在分配模型中进行预配,或者使用自动缩放预配的吞吐量This capacity can be allocated at a database or collection level, and it can be provisioned on an allocation model, or using the autoscale provisioned throughput.

  • 请求单位:在 Azure Cosmos DB 中,每个数据库操作都有关联的请求单位 (RU) 成本。Request Units: Every database operation has an associated Request Units (RUs) cost in Azure Cosmos DB. 执行操作时,将从在给定的秒可用的请求单位级别中减去此成本。When executed, this is subtracted from the available request units level on a given second. 如果请求所需的 RU 数超过了当前分配的每秒 RU 数,可以使用两个选项来解决此问题 - 增加 RU 数量,或等待下一秒开始,然后重试操作。If a request requires more RUs than the currently allocated RU/s there are two options to solve the issue - increase the amount of RUs, or wait until the next second starts and then retry the operation.

  • 弹性容量:给定集合或数据库的容量随时可以更改。Elastic capacity: The capacity for a given collection or database can change at any time. 这样,数据库就能弹性适应工作负荷的吞吐量要求。This allows for the database to elastically adapt to the throughput requirements of your workload.

  • 自动分片:Azure Cosmos DB 提供一个仅需要分片(或分区键)的自动分区系统。Automatic sharding: Azure Cosmos DB provides an automatic partitioning system that only requires a shard (or a partition key). 自动分区机制在所有 Azure Cosmos DB API 之间共享,允许通过水平分配进行无缝的数据缩放和全面缩放。The automatic partitioning mechanism is shared across all the Azure Cosmos DB APIs and it allows for seamless data and throughout scaling through horizontal distribution.

适用于 Azure Cosmos DB’s API for MongoDB 的迁移选项Migration options for Azure Cosmos DB's API for MongoDB

适用于 Azure Cosmos DB’s API for MongoDB 的 Azure 数据库迁移服务提供一个机制来简化数据迁移。该机制提供完全托管的主机平台、迁移监视选项和自动限制处理。The Azure Database Migration Service for Azure Cosmos DB's API for MongoDB provides a mechanism that simplifies data migration by providing a fully managed hosting platform, migration monitoring options and automatic throttling handling. 下面是完整的选项列表:The full list of options are the following:

迁移类型Migration type 解决方案Solution 注意事项Considerations
联机Online Azure 数据库迁移服务Azure Database Migration Service • 利用 Azure Cosmos DB 批量执行程序库• Makes use of the Azure Cosmos DB bulk executor library
• 适合用于大型数据集,负责复制实时更改• Suitable for large datasets and takes care of replicating live changes
• 仅适用于其他 MongoDB 源• Works only with other MongoDB sources
OfflineOffline Azure 数据库迁移服务Azure Database Migration Service • 利用 Azure Cosmos DB 批量执行程序库• Makes use of the Azure Cosmos DB bulk executor library
• 适合用于大型数据集,负责复制实时更改• Suitable for large datasets and takes care of replicating live changes
• 仅适用于其他 MongoDB 源• Works only with other MongoDB sources
OfflineOffline Azure 数据工厂Azure Data Factory • 易于设置且支持多个源• Easy to set up and supports multiple sources
• 利用 Azure Cosmos DB 批量执行程序库• Makes use of the Azure Cosmos DB bulk executor library
• 适合用于大型数据集• Suitable for large datasets
• 缺少检查点,这意味着,在迁移过程中出现任何问题都需要重启整个迁移过程• Lack of checkpointing means that any issue during the course of migration would require a restart of the whole migration process
• 缺少死信队列,这意味着,出现几个有错误的文件就可能会停止整个迁移过程。• Lack of a dead letter queue would mean that a few erroneous files could stop the entire migration process
• 需要编写自定义代码来增大某些数据源的读取吞吐量• Needs custom code to increase read throughput for certain data sources
OfflineOffline 现有的 Mongo 工具(mongodump、mongorestore、Studio3T)Existing Mongo Tools (mongodump, mongorestore, Studio3T) • 易于设置和集成• Easy to set up and integration
• 需要对限制进行自定义处理• Needs custom handling for throttles

估算工作负荷所需的吞吐量Estimate the throughput need for your workloads

在 Azure Cosmos DB 中,吞吐量是提前预配的,按每秒请求单位 (RU) 数计量。In Azure Cosmos DB, the throughput is provisioned in advance and is measured in Request Units (RU's) per second. 不同于 VM 或本地服务器,RU 随时可以轻松纵向扩展和缩减。Unlike VMs or on-premises servers, RUs are easy to scale up and down at any time. 可以即时更改预配的 RU 数。You can change the number of provisioned RUs instantly. 有关详细信息,请参阅 Azure Cosmos DB 中的请求单位For more information, see Request units in Azure Cosmos DB.

可以使用 Azure Cosmos DB 容量计算器根据数据库帐户配置、数据量、文档大小以及每秒所需的读取和写入次数,来确定请求单位数量。You can use the Azure Cosmos DB Capacity Calculator to determine the amount of Request Units based on your database account configuration, amount of data, document size, and required reads and writes per second.

下面是影响所需 RU 数的关键因素:The following are key factors that affect the number of required RUs:

  • 文档大小:随着项/文档大小的增大,读取或写入该项/文档所要消耗的 RU 数也会增加。Document size: As the size of an item/document increases, the number of RUs consumed to read or write the item/document also increases.

  • 文档属性计数:创建或更新文档所消耗的 RU 数与该文档的属性数目、复杂性和长度相关。Document property count:The number of RUs consumed to create or update a document is related to the number, complexity and length of its properties. 可以通过限制已编制索引的属性数目,来减少写入操作的请求单位消耗量。You can reduce the request unit consumption for write operations by limiting the number of indexed properties.

  • 查询模式:查询的复杂性会影响查询消耗的请求单位数。Query patterns: The complexity of a query affects how many request units are consumed by the query.

了解查询成本的最佳方式是使用 Azure Cosmos DB 中的示例数据,并在 MongoDB Shell 中使用 getLastRequestStastistics 命令运行示例查询以获取请求开销,此命令将输出消耗的 RU 数:The best way to understand the cost of queries is to use sample data in Azure Cosmos DB, and run sample queries from the MongoDB Shell using the getLastRequestStastistics command to get the request charge, which will output the number of RUs consumed:

db.runCommand({getLastRequestStatistics: 1})

此命令将输出类似于以下内容的 JSON 文档:This command will output a JSON document similar to the following:

{ "_t": "GetRequestStatisticsResponse", "ok": 1, "CommandName": "find", "RequestCharge": 10.1, "RequestDurationInMilliSeconds": 7.2}

也可以使用诊断设置来了解针对 Azure Cosmos DB 执行的查询的频率和模式。You can also use the diagnostic settings to understand the frequency and patterns of the queries executed against Azure Cosmos DB. 可将诊断日志中的结果发送到存储帐户、事件中心实例或 Azure Log AnalyticsThe results from the diagnostic logs can be sent to a storage account, an EventHub instance or Azure Log Analytics.

选择分区键Choose your partition key

分区(也称为分片)是迁移数据之前要考虑的一个要点。Partitioning, also known as Sharding, is a key point of consideration before migrating data. Azure Cosmos DB 使用完全托管的分区来提高数据库中的容量,以满足存储和吞吐量要求。Azure Cosmos DB uses fully-managed partitioning to increase the capacity in a database to meet the storage and throughput requirements. 此功能不需要托管或配置路由服务器。This feature doesn't need the hosting or configuration of routing servers.

分区功能以类似方式自动增加容量,并相应地重新均衡数据。In a similar way, the partitioning capability automatically adds capacity and re-balances the data accordingly. 有关为数据选择适当分区键的详细信息和建议,请参阅选择分区键一文。For details and recommendations on choosing the right partition key for your data, please see the Choosing a Partition Key article.

为数据编制索引Index your data

Azure Cosmos DB API for MongoDB 服务器版本 3.6 仅自动为 _id 字段编制索引。The Azure Cosmos DB's API for MongoDB server version 3.6 automatically indexes the _id field only. 无法删除此字段。This field can't be dropped. 它会自动强制确保每个分片密钥的 _id 字段的唯一性。It automatically enforces the uniqueness of the _id field per shard key. 若要为其他字段编制索引,请应用 MongoDB 索引管理命令。To index additional fields, you apply the MongoDB index-management commands. 此默认索引编制策略不同于 Azure Cosmos DB SQL API,后者在默认情况下会为所有字段编制索引。This default indexing policy differs from the Azure Cosmos DB SQL API, which indexes all fields by default.

Azure Cosmos DB 提供的索引编制功能包括添加复合索引、唯一索引和生存时间 (TTL) 索引。The indexing capabilities provided by Azure Cosmos DB include adding compound indices, unique indices and time-to-live (TTL) indices. 索引管理接口映射到 createIndex() 命令。The index management interface is mapped to the createIndex() command. 详情请参阅 Azure Cosmos DB API for MongoDB 中的索引编制一文。Learn more at Indexing in Azure Cosmos DB's API for MongoDBarticle.

Azure 数据库迁移服务自动迁移具有唯一索引的 MongoDB 集合。Azure Database Migration Service automatically migrates MongoDB collections with unique indexes. 但是,必须在迁移之前创建唯一索引。However, the unique indexes must be created before the migration. 如果集合中已包含数据,Azure Cosmos DB 将不支持创建唯一索引。Azure Cosmos DB does not support the creation of unique indexes, when there is already data in your collections. 有关详细信息,请参阅 Azure Cosmos DB 中的唯一键For more information, see Unique keys in Azure Cosmos DB.

后续步骤Next steps