将 MongoDB 数据迁移到 Azure Cosmos DBMigrate your MongoDB data to Azure Cosmos DB

本教程说明如何将 MongoDB 中存储的数据,迁移到配置为使用 Cosmos DB 的用于 MongoDB 的 API 的 Azure Cosmos DB。This tutorial provides instructions on how to migrate data stored in MongoDB to Azure Cosmos DB configured to use Cosmos DB's API for MongoDB. 如果从 MongoDB 导入数据,并打算将其与 Azure Cosmos DB SQL API 配合使用,则应使用数据迁移工具来导入数据。If you import data from MongoDB and plan to use it with the Azure Cosmos DB SQL API, you should use the Data Migration tool to import the data.

在本教程中,你将:In this tutorial, you will:

  • 准备迁移计划。Prepare a migration plan.
  • 使用 mongoimport 迁移数据。Migrate data by using mongoimport.
  • 使用 mongorestore 迁移数据。Migrate data by using mongorestore.

如果没有 Azure 订阅,请在开始前创建一个试用帐户If you don't have an Azure subscription, create a trial account before you begin.

先决条件Prerequisites

开始迁移之前,请查看并满足以下先决条件:Review and complete the following prerequisites before you start the migration.

规划迁移Plan for the migration

本部分介绍如何规划数据迁移。This section describes how to plan for the data migration. 我们将估算 RU 费用、确定从计算机到云服务的延迟,并计算批大小和插入工作线程数。We'll estimate the RU charges, determine the latency from your machine to the cloud service, and calculate the batch size and number of insertion workers.

预创建和缩放集合Pre-create and scale your collections

使用 mongoimport 或 mongorestore 迁移之前,请通过 Azure 门户或 MongoDB 驱动程序和工具预先创建所有集合。Before you migrate with mongoimport or mongorestore, pre-create all your collections from the Azure portal or from MongoDB drivers and tools.

Azure 门户中,提高用于迁移的集合吞吐量。From the Azure portal, increase your collections throughput for the migration. 提高吞吐量后,可以避免受到速率限制,并缩短迁移时间。With a higher throughput, you can avoid being rate limited and migrate in less time. 可以在迁移后立即降低吞吐量,以节省成本。You can reduce the throughput immediately after the migration to save costs.

除了在集合级别预配吞吐量以外,还可以针对一组集合在数据库级别预配吞吐量,以共享预配的吞吐量。In addition to provisioning throughput at a collection level, you can also provision throughput at the database level for a set of collections to share the provisioned throughput. 需要预先创建数据库和集合,并为共享吞吐量数据库中的每个集合定义分片键。You need to pre-create the database and collections and define a shard key for each collection in the shared throughput database.

可以使用偏好的工具、驱动程序或 SDK 创建分片集合。You can create sharded collections using your preferred tool, driver, or SDK. 此示例使用 Mongo Shell 创建分片集合:In this example, we use the Mongo Shell to create a sharded collection:

db.runCommand( { shardCollection: "admin.people", key: { region: "hashed" } } )

该命令返回以下结果:The command returns the following results:

{
    "_t" : "ShardCollectionResponse",
    "ok" : 1,
    "collectionsharded" : "admin.people"
}

计算单文档写入的近似 RU 费用Calculate the approximate RU charge for a single document write

从 MongoDB Shell 连接到配置为使用 Cosmos DB 的用于 MongoDB 的 API 的 Cosmos 帐户。From the MongoDB Shell, connect to your Cosmos account configured to use Cosmos DB's API for MongoDB. 有关说明,请参阅将 MongoDB 应用程序连接到 Cosmos DBYou can find instructions in Connect a MongoDB application to Cosmos DB.

接下来,使用某个示例文档运行示例插入命令:Next, run a sample insert command by using one of your sample documents:

db.coll.insert({ "playerId": "a067ff", "hashedid": "bb0091", "countryCode": "hk" })

运行命令 db.runCommand({getLastRequestStatistics: 1})Run the command db.runCommand({getLastRequestStatistics: 1}).

将返回以下输出所示的响应:You receive a response like the following output:

globaldb:PRIMARY> db.runCommand({getLastRequestStatistics: 1})
{
    "_t": "GetRequestStatisticsResponse",
    "ok": 1,
    "CommandName": "insert",
    "RequestCharge": 10,
    "RequestDurationInMilliSeconds": NumberLong(50)
}

记下请求费用。Take note of the request charge.

确定从计算机到 Cosmos DB 的延迟Determine the latency from your machine to Cosmos DB

在 MongoDB Shell 中使用 setVerboseShell(true) 命令启用详细日志记录。Enable verbose logging from the MongoDB Shell with the command setVerboseShell(true).

使用 db.coll.find().limit(1) 命令针对数据库运行基本查询。Run a basic query against the database with the command db.coll.find().limit(1).

将返回以下输出所示的响应:You receive a response like the following output:

Fetched 1 record(s) in 100(ms)

运行迁移之前,请删除已插入的文档,以确保没有重复的文档。Before you run the migration, remove the inserted document to ensure there are no duplicate documents. 可以使用 db.coll.remove({}) 命令删除文档。You can remove documents with the command db.coll.remove({}).

计算 batchSize 和 numInsertionWorkers 属性的近似值Calculate the approximate values for the batchSize and numInsertionWorkers properties

对于 batchSize 属性,请根据“确定从计算机到 Cosmos DB 的延迟”部分中的详述,将总预配吞吐量(RU/秒)数除单个文档写入所消耗的 RU 数。For the batchSize property, divide the total provisioned throughput (RUs/sec) by the RUs consumed for a single document write, as completed in the section "Determine the latency from your machine to Cosmos DB." 如果计算出的小于或等于 24,请将该数字用作属性值。If the calculated value is less than or equal to 24, use that number as the property value. 如果计算出的值大于 24,请将属性值设置为 24。If the calculated value is greater than 24, set the property value to 24.

对于 numInsertionWorkers 属性,请使用以下公式:For the value of the numInsertionWorkers property, use this equation:

numInsertionWorkers = (Provisioned RUs throughput * Latency in seconds) / (batchSize * Consumed RUs for a single write)

可以使用以下值来计算 numInsertionWorkers 属性的值:We can use the following values to calculate a value for the numInsertionWorkers property:

属性Property Value
batchSizebatchSize 2424
预配的 RU 数Provisioned RUs 10,00010,000
延迟Latency 0.100 秒0.100 s
消耗的 RU 数Consumed RUs 10 RU10 RUs
numInsertionWorkersnumInsertionWorkers (10000 RU x 0.100 秒) / (24 x 10 RU) = 4.1666(10,000 RUs x 0.100 s) / (24 x 10 RUs) = 4.1666

运行 monogoimport 迁移命令。Run the monogoimport migration command. 本文稍后将介绍命令参数。The command parameters are described later in this article.

mongoimport.exe --host cosmosdb-mongodb-account.documents.azure.cn:10255 -u cosmosdb-mongodb-account -p <Your_MongoDB_password> --ssl --sslAllowInvalidCertificates --jsonArray --db dabasename --collection collectionName --file "C:\sample.json" --numInsertionWorkers 4 --batchSize 24

也可以使用 monogorestore 命令。You can also use the monogorestore command. 确保所有集合的吞吐量都设置为前面计算中使用的 RU 数或更大。Make sure all collections have the throughput set at or above the number of RUs used in the previous calculations.

mongorestore.exe --host cosmosdb-mongodb-account.documents.azure.cn:10255 -u cosmosdb-mongodb-account -p <Your_MongoDB_password> --ssl --sslAllowInvalidCertificates ./dumps/dump-2016-12-07 --numInsertionWorkersPerCollection 4 --batchSize 24

满足先决条件Complete the prerequisites

规划迁移后,请完成以下步骤:After you plan for migration, complete the following steps:

  • 获取示例数据:开始迁移之前,请务必准备一些示例数据。Get sample data: Make sure you have some sample data before you start the migration.

  • 提高吞吐量:数据迁移的持续时间取决于为单个集合或数据库预配的吞吐量。Increase throughput: The duration of your data migration depends on the amount of throughput you provision for an individual collection or database. 请确保对于较大的数据迁移增加吞吐量。Be sure to increase the throughput for larger data migrations. 完成迁移后,请降低吞吐量以节省成本。After you complete the migration, decrease the throughput to save costs.

  • 启用 SSL:Cosmos DB 遵守严格的安全要求和标准。Enable SSL: Cosmos DB has strict security requirements and standards. 与 Cosmos 帐户交互时,请务必启用 SSL。Be sure to enable SSL when you interact with your Cosmos account. 本文中的过程包括为 mongoimport 和 mongorestore 命令启用 SSL。The procedures in this article include how to enable SSL for the mongoimport and mongorestore commands.

  • 创建 Cosmos DB 资源:开始迁移之前,请在 Azure 门户中预先创建所有集合。Create Cosmos DB resources: Before you start the migration, pre-create all your collections from the Azure portal. 若要迁移到具有数据库级预配吞吐量的 Cosmos 帐户,请务必在创建集合时提供分区键。If you migrate to a Cosmos account that has database-level provisioned throughput, make sure to provide a partition key when you create the collections.

  • 获取连接字符串:在 Azure 门户的左侧,选择“Azure Cosmos DB”条目。Get your connection string: In the Azure portal, select the Azure Cosmos DB entry on the left. 在“订阅”下,选择自己的帐户名。Under Subscriptions, select your account name. 在“连接字符串”下,选择“连接字符串”。Under Connection String, select Connection String. 门户右侧会显示连接到该帐户所需的信息:The right side of the portal shows the information you need to connect to your account:

    连接字符串信息

使用 mongoimportUse mongoimport

若要将数据导入 Cosmos 帐户,请使用以下模板。To import data into your Cosmos account, use the following template.

mongoimport.exe --host <your_hostname>:10255 -u <your_username> -p <your_password> --db <your_database> --collection <your_collection> --ssl --sslAllowInvalidCertificates --type json --file "C:\sample.json"

请将 <your_hostname>、<your_username> 和 <your_password> 参数替换为帐户的特定值。Replace the <your_hostname>, <your_username>, and <your_password> parameters with the specific values for your account. 在以下示例中,我们使用 sampleDB 作为 <your_database> 的值,使用 sampleColl 作为 <your_collection> 的值:In the following example, we use sampleDB as the value for <your_database>, and sampleColl as the value for <your_collection>:

mongoimport.exe --host cosmosdb-mongodb-account.documents.azure.cn:10255 -u cosmosdb-mongodb-account -p <Your_MongoDB_password> --ssl --sslAllowInvalidCertificates --db sampleDB --collection sampleColl --type json --file "C:\Users\admin\Desktop\*.json"

使用 mongorestoreUse mongorestore

若要将数据还原到使用 Cosmos DB 的用于 MongoDB 的 API 配置的 Cosmos 帐户,请使用以下模板执行导入。To restore data to your Cosmos account configured with Cosmos DB's API for MongoDB, use the following template to execute the import.

mongorestore.exe --host <your_hostname>:10255 -u <your_username> -p <your_password> --db <your_database> --collection <your_collection> --ssl --sslAllowInvalidCertificates <path_to_backup>

请将 <your_hostname>、<your_username> 和 <your_password> 参数替换为帐户的特定值。Replace the <your_hostname>, <your_username>, and <your_password> parameters with the specific values for your account. 在以下示例中,我们使用 ./dumps/dump-2016-12-07 作为 <path_to_backup> 的值:In the following example, we use ./dumps/dump-2016-12-07 as the value for <path_to_backup>:

mongorestore.exe --host cosmosdb-mongodb-account.documents.azure.cn:10255 -u cosmosdb-mongodb-account -p <Your_MongoDB_password> --db mydatabase --collection mycollection --ssl --sslAllowInvalidCertificates ./dumps/dump-2016-12-07

清理资源Clean up resources

不再需要本教程中创建的资源时,可以删除相应的资源组、Cosmos 帐户和所有相关资源。When you no longer need the resources, you can delete the resource group, Cosmos account, and all the related resources. 使用以下步骤删除资源组:Use the following steps to delete the resource group:

  1. 转到在其中创建了 Cosmos 帐户的资源组。Go to the resource group where you created the Cosmos account.
  2. 选择“删除资源组”。Select Delete resource group.
  3. 确认要删除的资源组的名称,然后选择“删除”。Confirm the name of the resource group to delete, and select Delete.

后续步骤Next steps

请继续学习下一篇教程,了解如何使用 Azure Cosmos DB 的用于 MongoDB 的 API 查询数据。Continue to the next tutorial to learn how to query data using Azure Cosmos DB's API for MongoDB.