将数百 TB 的数据迁移到 Azure Cosmos DBMigrate hundreds of terabytes of data into Azure Cosmos DB

Azure Cosmos DB 可以存储 TB 级的数据。Azure Cosmos DB can store terabytes of data. 可以执行大规模数据迁移,将生产工作负荷移动到 Azure Cosmos DB。You can perform a large-scale data migration to move your production workload to Azure Cosmos DB. 本文介绍将大规模数据迁移到 Azure Cosmos DB 涉及的难题,并介绍有助于应对挑战和将数据迁移到 Azure Cosmos DB 的工具。This article describes the challenges involved in moving large-scale data to Azure Cosmos DB and introduces you to the tool that helps with the challenges and migrates data to Azure Cosmos DB. 在本案例研究中,客户使用的是 Cosmos DB SQL API。In this case study, the customer used the Cosmos DB SQL API.

在将整个工作负荷迁移到 Azure Cosmos DB 之前,可以先迁移一部分数据,以便在某些方面进行验证,例如分区键的选择、查询性能和数据建模。Before you migrate the entire workload to Azure Cosmos DB, you can migrate a subset of data to validate some of the aspects like partition key choice, query performance, and data modeling. 验证概念证明后,可将整个工作负荷移到 Azure Cosmos DB。After you validate the proof of concept, you can move the entire workload to Azure Cosmos DB.

数据迁移工具Tools for data migration

目前,Azure Cosmos DB 迁移策略根据所选的 API 和数据大小而异。Azure Cosmos DB migration strategies currently differ based on the API choice and the size of the data. 若要迁移小型数据集(以便验证数据建模、查询性能、分区键的选择等),可以选择数据迁移工具Azure 数据工厂的 Azure Cosmos DB 连接器To migrate smaller datasets - for validating data modeling, query performance, partition key choice etc. - you can choose the Data Migration Tool or Azure Data Factory's Azure Cosmos DB connector. 如果你熟悉 Spark,则还可以选择使用 Azure Cosmos DB Spark 连接器来迁移数据。If you are familiar with Spark, you can also choose to use the Azure Cosmos DB Spark connector to migrate data.

大规模迁移的挑战Challenges for large-scale migrations

用于将数据迁移到 Azure Cosmos DB 的现有工具存在一些限制,进行大规模迁移时,这些限制尤其明显:The existing tools for migrating data to Azure Cosmos DB have some limitations that become especially apparent at large scales:

  • 受限的横向扩展功能:若要尽快将 TB 量级的数据迁移到 Azure Cosmos DB 并有效利用整个预配吞吐量,迁移客户端应该具备无限横向扩展的能力。Limited scale out capabilities: In order to migrate terabytes of data into Azure Cosmos DB as quickly as possible, and to effectively consume the entire provisioned throughput, the migration clients should have the ability to scale out indefinitely.

  • 缺少进度跟踪和检查点:迁移大型数据集时,必须跟踪迁移进度并创建检查点。Lack of progress tracking and check-pointing: It is important to track the migration progress and have check-pointing while migrating large data sets. 否则,迁移过程中出现的任何错误都会导致迁移停止,且必须从头开始迁移。Otherwise, any error that occurs during the migration will stop the migration, and you have to start the process from scratch. 如果迁移过程已完成 99%,重新开始整个迁移过程将降低效率。It would be not productive to restart the whole migration process when 99% of it has already completed.

  • 缺少死信队列:在大型数据集中,可能有一部分源数据存在问题。Lack of dead letter queue: Within large data sets, in some cases there could be issues with parts of the source data. 此外,客户端或网络可能出现暂时性的问题。Additionally, there might be transient issues with the client or the network. 出现其中的任何一种情况都不应导致整个迁移失败。Either of these cases should not cause the entire migration to fail. 尽管大多数迁移工具都具有可靠的重试功能,可防范间歇性的问题,但它们并不总是足够可靠。Even though most migration tools have robust retry capabilities that guard against intermittent issues, it is not always enough. 例如,如果 0.01% 以下的源数据文档大小大于 2 MB,将会导致文档写入 Azure Cosmos DB 失败。For example, if less than 0.01% of the source data documents are greater than 2 MB in size, it will cause the document write to fail in Azure Cosmos DB. 理想情况下,有用的做法是让迁移工具将这些“失败的”文档保存到另一个死信队列,以便在迁移后进行处理。Ideally, it is useful for the migration tool to persist these 'failed' documents to another dead letter queue, which can be processed post migration.

Azure 数据工厂、Azure 数据迁移服务之类的工具正在修复上述许多限制。Many of these limitations are being fixed for tools like Azure Data factory, Azure Data Migration services.

包含批量执行程序库的自定义工具Custom tool with bulk executor library

使用可跨多个实例轻松横向扩展并能弹性应对暂时性故障的自定义工具可以解决上一部分所述的挑战。The challenges described in the above section, can be solved by using a custom tool that can be easily scaled out across multiple instances and it is resilient to transient failures. 此外,自定义工具可以在不同的检查点处暂停和恢复迁移。Additionally, the custom tool can pause and resume migration at various checkpoints. Azure Cosmos DB 已提供整合了其中某些功能的批量执行程序库Azure Cosmos DB already provides the bulk executor library that incorporates some of these features. 例如,批量执行程序库已包含处理暂时性错误的功能,并且可以横向扩展单个节点中的线程,以消耗每个节点中的大约 50 万个 RU。For example, the bulk executor library already has the functionality to handle transient errors and can scale out threads in a single node to consume about 500 K RUs per node. 批量执行程序库还可将源数据集分区成可作为检查点形式独立运行的微批。The bulk executor library also partitions the source dataset into micro-batches that are operated independently as a form of checkpointing.

自定义工具使用批量执行程序库,支持跨多个客户端横向扩展,并可以跟踪引入过程中出现的错误。The custom tool uses the bulk executor library and supports scaling out across multiple clients and to track errors during the ingestion process. 若要使用此工具,应将源数据分区成 Azure Data Lake Storage (ADLS) 中的不同文件,以便不同的迁移工作线程可以选取每个文件并将其引入 Azure Cosmos DB。To use this tool, the source data should be partitioned into distinct files in Azure Data Lake Storage (ADLS) so that different migration workers can pick up each file and ingest them into Azure Cosmos DB. 自定义工具利用单独的集合,该集合存储 ADLS 中每个源文件的迁移进度的相关元数据,并跟踪与这些文件关联的任何错误。The custom tool makes use of a separate collection, which stores metadata about the migration progress for each individual source file in ADLS and tracks any errors associated with them.

下图描述了使用此自定义工具的迁移过程。The following image describes the migration process using this custom tool. 该工具在一组虚拟机上运行,其中每个虚拟机查询 Azure Cosmos DB 中的跟踪集合,以获取某个源数据分区上的租约。The tool is running on a set of virtual machines, and each virtual machine queries the tracking collection in Azure Cosmos DB to acquire a lease on one of the source data partitions. 完成此操作后,该工具将读取源数据分区,并使用批量执行程序库将其引入 Azure Cosmos DB。Once this is done, the source data partition is read by the tool and ingested into Azure Cosmos DB by using the bulk executor library. 接下来,跟踪集合将会更新,以记录数据引入的进度和遇到的任何错误。Next, the tracking collection is updated to record the progress of data ingestion and any errors encountered. 处理数据分区后,该工具会尝试查询下一个可用的源分区。After a data partition is processed, the tool attempts to query for the next available source partition. 它会继续处理下一个源分区,直到迁移了所有数据。It continues to process the next source partition until all the data is migrated. 此处提供了该工具的源代码。The source code for the tool is available here.

迁移工具设置

跟踪集合包含以下示例中所示的文档。The tracking collection contains documents as shown in the following example. 你将看到,源数据中的每个分区都有一个这样的文档。You will see such documents one for each partition in the source data. 每个文档包含源数据分区的元数据,例如,该数据分区的位置、迁移状态和错误(如果有):Each document contains the metadata for the source data partition such as its location, migration status, and errors (if any):

{ 
  "owner": "25812@bulkimporttest07", 
  "jsonStoreEntityImportResponse": { 
    "numberOfDocumentsReceived": 446688, 
    "isError": false, 
    "totalRequestUnitsConsumed": 3950252.2800000003, 
    "errorInfo": [], 
    "totalTimeTakenInSeconds": 188, 
    "numberOfDocumentsImported": 446688 
  }, 
  "storeType": "AZURE_BLOB", 
  "name": "sourceDataPartition", 
  "location": "sourceDataPartitionLocation", 
  "id": "sourceDataPartitionId", 
  "isInProgress": false, 
  "operation": "unpartitioned-writes", 
  "createDate": { 
    "seconds": 1561667225, 
    "nanos": 146000000 
  }, 
  "completeDate": { 
    "seconds": 1561667515, 
    "nanos": 180000000 
  }, 
  "isComplete": true 
} 

数据迁移的先决条件Prerequisites for data migration

在开始迁移数据之前,需要注意以下几项先决条件:Before the data migration starts, there are a few prerequisites to consider:

估算数据大小:Estimate the data size:

源数据大小无法确切地映射到 Azure Cosmos DB 中的数据大小。The source data size may not exactly map to the data size in Azure Cosmos DB. 可以插入源中的几个示例文档,以检查它们在 Azure Cosmos DB 中的数据大小。A few sample documents from the source can be inserted to check their data size in Azure Cosmos DB. 然后可以根据示例文档的大小,估算迁移后 Azure Cosmos DB 中的总数据大小。Depending on the sample document size, the total data size in Azure Cosmos DB post-migration, can be estimated.

例如,如果每个文档在迁移到 Azure Cosmos DB 之后大约为 1 KB,并且源数据集中大约有 600 亿个文档,则 Azure Cosmos DB 中的估算大小大致为 60 TB。For example, if each document after migration in Azure Cosmos DB is around 1 KB and if there are around 60 billion documents in the source dataset, it would mean that the estimated size in Azure Cosmos DB would be close to 60 TB.

预先创建具有足够 RU 的容器:Pre-create containers with enough RUs:

尽管 Azure Cosmos DB 可自动横向扩展存储,但我们不建议从最小的容器大小开始。Although Azure Cosmos DB scales out storage automatically, it is not advisable to start from the smallest container size. 较小容器的可用吞吐量较低,这意味着,完成迁移所需的时间更长。Smaller containers have lower throughput availability, which means that the migration would take much longer to complete. 相反,有效的做法是使用最终数据大小(前一步骤中估算的大小)创建容器,并确保迁移工作负荷完全消耗预配的吞吐量。Instead, it is useful to create the containers with the final data size (as estimated in the previous step) and make sure that the migration workload is fully consuming the provisioned throughput.

由于在上一步骤中,In the previous step. 估算的数据大小约为 60 TB,因此需要创建一个至少有 240 万个 RU 的容器才能容纳整个数据集。since the data size was estimated to be around 60 TB, a container of at least 2.4 M RUs is required to accommodate the entire dataset.

估算迁移速度:Estimate the migration speed:

假设迁移工作负荷可以消耗整个预配吞吐量,预配吞吐量将会提供迁移速度的估算值。Assuming that the migration workload can consume the entire provisioned throughput, the provisioned throughout would provide an estimation of the migration speed. 沿用前面的示例,需要使用 5 个 RU 将一个 1-KB 文档写入 Azure Cosmos DB SQL API 帐户。Continuing the previous example, 5 RUs are required for writing a 1-KB document to Azure Cosmos DB SQL API account. 使用 240 万个 RU 每秒可以传输 48 万个文档(每秒 480 MB)。2.4 million RUs would allow a transfer of 480,000 documents per second (or 480 MB/s). 这意味着,完成 60 TB 数据的迁移需要花费 125000 秒,即大约 34 小时。This means that the complete migration of 60 TB will take 125,000 seconds or about 34 hours.

如果你希望迁移在一天内完成,应将预配吞吐量提高到 500 万个 RU。In case you want the migration to be completed within a day, you should increase the provisioned throughput to 5 million RUs.

关闭索引:Turn off the indexing:

由于迁移应尽快完成,我们建议最大程度地减少为每个引入文档创建索引所用的时间和 RU。Since the migration should be completed as soon as possible, it is advisable to minimize time and RUs spent on creating indexes for each of the documents ingested. Azure Cosmos DB 自动为所有属性编制索引,因此有必要在迁移过程中将索引操作尽量限制为最少量的几个选定字词,或完全关闭索引。Azure Cosmos DB automatically indexes all properties, it is worthwhile to minimize indexing to a selected few terms or turn it off completely for the course of migration. 可以通过将 indexingMode 更改为 none 来关闭容器的索引策略,如下所示:You can turn off the container's indexing policy by changing the indexingMode to none as shown below:

  { 
        "indexingMode": "none" 
  } 

迁移完成后,可以更新索引。After the migration is complete, you can update the indexing.

迁移过程Migration process

满足先决条件后,可以执行以下步骤来迁移数据:After the prerequisites are completed, you can migrate data with the following steps:

  1. 首先,将源中的数据导入 Azure Blob 存储。First import the data from source to Azure Blob Storage. 若要提高迁移速度,可以跨不同的源分区进行并行化。To increase the speed of migration, it is helpful to parallelize across distinct source partitions. 在开始迁移之前,应将源数据集分区成大小约为 200 MB 的文件。Before starting the migration, the source data set should be partitioned into files with size around 200 MB size.

  2. 批量执行程序库可以纵向扩展,以消耗单个客户端 VM 中的 50 万个 RU。The bulk executor library can scale up, to consume 500,000 RUs in a single client VM. 由于可用吞吐量为 500 万个 RU,因此,应在 Azure Cosmos 数据库所在的同一区域中预配 10 个 Ubuntu 16.04 VM (Standard_D32_v3)。Since the available throughput is 5 million RUs, 10 Ubuntu 16.04 VMs (Standard_D32_v3) should be provisioned in the same region where your Azure Cosmos database is located. 应使用迁移工具及其设置文件准备好这些 VM。You should prepare these VMs with the migration tool and its settings file.

  3. 在某个客户端虚拟机上运行排队步骤。Run the queue step on one of the client virtual machines. 此步骤将创建跟踪集合用于扫描 ADLS 容器,并针对源数据集的每个分区文件创建进度跟踪文档。This step creates the tracking collection, which scans the ADLS container and creates a progress-tracking document for each of the source data set's partition files.

  4. 接下来,在所有客户端 VM 上运行导入步骤。Next, run the import step on all the client VMs. 每个客户端都可以取得源分区的所有权,并将其数据引入 Azure Cosmos DB。Each of the clients can take ownership on a source partition and ingest its data into Azure Cosmos DB. 此步骤完成并且其状态在跟踪集合中更新之后,客户端可以查询跟踪集合中的下一个可用源分区。Once it's completed and its status is updated in the tracking collection, the clients can then query for the next available source partition in the tracking collection.

  5. 此过程将持续到引入了整个源分区集为止。This process continues until the entire set of source partitions were ingested. 处理所有源分区之后,应以纠错模式针对同一个跟踪集合重新运行该工具。Once all the source partitions are processed, the tool should be rerun on the error-correction mode on the same tracking collection. 需要执行此步骤来识别由于出错而要重新处理的源分区。This step is required to identify the source partitions that should to be re-processed due to errors.

  6. 其中某些错误可能是源数据中错误的文档造成的。Some of these errors could be due to incorrect documents in the source data. 应识别并修复这些错误。These should be identified and fixed. 接下来,应针对失败的分区重新运行导入步骤以将其重新引入。Next, you should rerun the import step on the failed partitions to reingest them.

完成迁移后,可以验证 Azure Cosmos DB 中的文档计数是否与源数据库中的文档计数相同。Once the migration is completed, you can validate that the document count in Azure Cosmos DB is same as the document count in the source database. 在此示例中,Azure Cosmos DB 中的总大小为 65 TB。In this example, the total size in Azure Cosmos DB turned out to 65 terabytes. 迁移后,可以选择性地打开索引,并将 RU 降低到执行工作负荷操作所需的级别。Post migration, indexing can be selectively turned on and the RUs can be lowered to the level required by the workload's operations.

联系 Azure Cosmos DB 团队Contact the Azure Cosmos DB team

虽然可以遵循本指南成功地将大型数据集迁移到 Azure Cosmos DB,但对于大规模迁移,我们建议联系 Azure Cosmos DB 产品团队来验证数据建模并完成一般性的体系结构评审。Although you can follow this guide to successfully migrate large datasets to Azure Cosmos DB, for large scale migrations, it is recommended that you reach out the Azure Cosmos DB product team to validate the data modelling and a general architecture review. 产品团队还可以根据数据集和工作负荷提出可能适用于你的其他性能和成本优化方案。Based on your dataset and workload, the product team can also suggest other performance and cost optimizations that could be applicable to you. 若要联系 Azure Cosmos DB 团队以获取大规模迁移的帮助,可以开具支持票证To contact the Azure Cosmos DB team for assistance with large scale migrations, you can open a support ticket for additional help.

后续步骤Next steps

  • 若要进行详细了解,请试用那些在 .NETJava 中使用批量执行程序库的示例应用程序。Learn more by trying out the sample applications consuming the bulk executor library in .NET and Java.

  • 批量执行程序库已集成到 Cosmos DB Spark 连接器中。若要进行详细的了解,请参阅 Azure Cosmos DB Spark 连接器一文。The bulk executor library is integrated into the Cosmos DB Spark connector, to learn more, see Azure Cosmos DB Spark connector article.

  • 若要获取大规模迁移的更多帮助,请通过开具支持票证联系 Azure Cosmos DB 产品团队。Contact the Azure Cosmos DB product team by opening a support ticket for additional help with large scale migrations.