用于将本地或云数据迁移到 Azure Cosmos DB 的选项Options to migrate your on-premises or cloud data to Azure Cosmos DB

适用于: SQL API Cassandra API Gremlin API 表 API Azure Cosmos DB API for MongoDB

可将各种数据源中的数据加载到 Azure Cosmos DB。You can load data from various data sources to Azure Cosmos DB. 由于 Azure Cosmos DB 支持多个 API,因此目标可以是任何现有的 API。Since Azure Cosmos DB supports multiple APIs, the targets can be any of the existing APIs. 下面是用于将数据迁移到 Azure Cosmos DB 的一些方案:The following are some scenarios where you migrate data to Azure Cosmos DB:

  • 将数据从一个 Azure Cosmos 容器移到同一数据库中的另一个容器,或移到其他数据库。Move data from one Azure Cosmos container to another container in the same database or a different databases.
  • 将专用容器之间的数据移动到共享数据库容器。Moving data between dedicated containers to shared database containers.
  • 将数据从位于 region1 的 Azure Cosmos 帐户移到相同或不同区域中的另一个 Azure Cosmos 帐户。Move data from an Azure Cosmos account located in region1 to another Azure Cosmos account in the same or a different region.
  • 将数据从 Azure blob 存储、JSON 文件、Oracle 数据库、Couchbase、DynamoDB 等源移动到 Azure Cosmos DB。Move data from a source such as Azure blob storage, a JSON file, Oracle database, Couchbase, DynamoDB to Azure Cosmos DB.

为了支持从各种源到不同 Azure Cosmos DB API 的迁移路径,有多种解决方案可为每个迁移路径提供专业化的处理。In order to support migration paths from the various sources to the different Azure Cosmos DB APIs, there are multiple solutions that provide specialized handling for each migration path. 本文档列出了可用的解决方案,并描述了其优势和限制。This document lists the available solutions and describes their advantages and limitations.

影响迁移工具选择的因素Factors affecting the choice of migration tool

以下因素决定了迁移工具的选择:The following factors determine the choice of the migration tool:

  • 联机与脱机迁移:许多迁移工具提供的路径仅限一次性迁移。Online vs offline migration: Many migration tools provide a path to do a one-time migration only. 这意味着,访问数据库的应用程序可能会出现一段时间的停机。This means that the applications accessing the database might experience a period of downtime. 某些迁移解决方案提供实时迁移的方式,这需要在源与目标之间设置复制管道。Some migration solutions provide a way to do a live migration where there is a replication pipeline set up between the source and the target.

  • 数据源:现有数据可以位于各种数据源中,例如 Oracle DB2、Datastax Cassanda、Azure SQL Database、PostgreSQL 等。数据还可以位于现有的 Azure Cosmos DB 帐户中,迁移的意图可以是更改数据模型,或者使用不同的分区键对容器中的数据重新分区。Data source: The existing data can be in various data sources like Oracle DB2, Datastax Cassanda, Azure SQL Database, PostgreSQL, etc. The data can also be in an existing Azure Cosmos DB account and the intent of migration can be to change the data model or repartition the data in a container with a different partition key.

  • Azure Cosmos DB API:对于 Azure Cosmos DB 中的 SQL API,Azure Cosmos DB 团队开发了各种工具来帮助实现不同的迁移方案。Azure Cosmos DB API: For the SQL API in Azure Cosmos DB, there are a variety of tools developed by the Azure Cosmos DB team which aid in the different migration scenarios. 所有其他 API 具有自身的,由社区开发和维护的专业工具集。All of the other APIs have their own specialized set of tools developed and maintained by the community. 由于 Azure Cosmos DB 在网络协议级别支持这些 API,因此,在将数据迁移到 Azure Cosmos DB 时,这些工具也应按原样运行。Since Azure Cosmos DB supports these APIs at a wire protocol level, these tools should work as-is while migrating data into Azure Cosmos DB too. 但是,它们可能需要对限制进行自定义的处理,因为此概念特定于 Azure Cosmos DB。However, they might require custom handling for throttles as this concept is specific to Azure Cosmos DB.

  • 数据大小:大多数迁移工具非常适合用于较小的数据集。Size of data: Most migration tools work very well for smaller datasets. 如果数据集超过几百 GB,则迁移工具的选择将受到限制。When the data set exceeds a few hundred gigabytes, the choices of migration tools are limited.

  • 预期的迁移持续时间:可将迁移配置为以递增步调缓慢进行,以减少消耗的吞吐量;或者,可将其配置为消耗针对目标 Azure Cosmos DB 容器预配的整个吞吐量,以便更快地完成迁移。Expected migration duration: Migrations can be configured to take place at a slow, incremental pace that consumes less throughput or can consume the entire throughput provisioned on the target Azure Cosmos DB container and complete the migration in less time.

Azure Cosmos DB SQL APIAzure Cosmos DB SQL API

迁移类型Migration type 解决方案Solution 受支持的源Supported sources 支持的目标Supported targets 注意事项Considerations
OfflineOffline 数据迁移工具Data Migration Tool •JSON/CSV 文件•JSON/CSV Files
•Azure Cosmos DB SQL API•Azure Cosmos DB SQL API
•MongoDB•MongoDB
•SQL Server•SQL Server
•表存储•Table Storage
•AWS DynamoDB•AWS DynamoDB
•Azure Blob 存储•Azure Blob Storage
•Azure Cosmos DB SQL API•Azure Cosmos DB SQL API
•Azure Cosmos DB 表 API•Azure Cosmos DB Tables API
•JSON 文件•JSON Files
• 易于设置并支持多个源。• Easy to set up and supports multiple sources.
• 不适合用于大型数据集。• Not suitable for large datasets.
OfflineOffline Azure 数据工厂Azure Data Factory •JSON/CSV 文件•JSON/CSV Files
•Azure Cosmos DB SQL API•Azure Cosmos DB SQL API
•用于 MongoDB 的 Azure Cosmos DB API•Azure Cosmos DB API for MongoDB
•MongoDB•MongoDB
•SQL Server•SQL Server
•表存储•Table Storage
•Azure Blob 存储•Azure Blob Storage

有关其他受支持的源,请参阅 Azure 数据工厂一文。See the Azure Data Factory article for other supported sources.
•Azure Cosmos DB SQL API•Azure Cosmos DB SQL API
•用于 MongoDB 的 Azure Cosmos DB API•Azure Cosmos DB API for MongoDB
•JSON 文件•JSON Files

有关其他受支持的目标,请参阅 Azure 数据工厂一文。See the Azure Data Factory article for other supported targets.
• 易于设置并支持多个源。• Easy to set up and supports multiple sources.
• 利用 Azure Cosmos DB 批量执行工具库。• Makes use of the Azure Cosmos DB bulk executor library.
• 适合用于大型数据集。• Suitable for large datasets.
• 缺少检查点 - 这意味着,如果在迁移过程中出现问题,需要重启整个迁移过程。• Lack of checkpointing - It means that if an issue occurs during the course of migration, you need to restart the whole migration process.
• 缺少死信队列 - 这意味着,出现几个有错误的文件就可能会停止整个迁移过程。• Lack of a dead letter queue - It means that a few erroneous files can stop the entire migration process.
OfflineOffline Azure Cosmos DB Spark 连接器Azure Cosmos DB Spark connector Azure Cosmos DB SQL API。Azure Cosmos DB SQL API.

可以将其他源与来自 Spark 生态系统的其他连接器配合使用。You can use other sources with additional connectors from the Spark ecosystem.
Azure Cosmos DB SQL API。Azure Cosmos DB SQL API.

可以将其他目标与来自 Spark 生态系统的其他连接器配合使用。You can use other targets with additional connectors from the Spark ecosystem.
• 利用 Azure Cosmos DB 批量执行工具库。• Makes use of the Azure Cosmos DB bulk executor library.
• 适合用于大型数据集。• Suitable for large datasets.
• 需要自定义的 Spark 设置。• Needs a custom Spark setup.
• Spark 对架构不一致性比较敏感,这可能会在迁移过程中造成问题。• Spark is sensitive to schema inconsistencies and this can be a problem during migration.
OfflineOffline 包含 Cosmos DB 批量执行程序库的自定义工具Custom tool with Cosmos DB bulk executor library 源依赖于自定义代码The source depends on your custom code Azure Cosmos DB SQL APIAzure Cosmos DB SQL API • 提供检查点和死信功能,可提高迁移复原能力。• Provides checkpointing, dead-lettering capabilities which increases migration resiliency.
• 适合用于极大型数据集 (10 TB+)。• Suitable for very large datasets (10 TB+).
• 需要对此工具进行自定义设置,使其作为应用服务运行。• Requires custom setup of this tool running as an App Service.
联机Online Cosmos DB Functions + ChangeFeed APICosmos DB Functions + ChangeFeed API Azure Cosmos DB SQL APIAzure Cosmos DB SQL API Azure Cosmos DB SQL APIAzure Cosmos DB SQL API • 易于设置。• Easy to set up.
• 仅当源是 Azure Cosmos DB 容器时才适用。• Works only if the source is an Azure Cosmos DB container.
• 不适合用于大型数据集。• Not suitable for large datasets.
• 不捕获源容器中的删除操作。• Does not capture deletes from the source container.
联机Online 使用 ChangeFeed 的自定义迁移服务Custom Migration Service using ChangeFeed Azure Cosmos DB SQL APIAzure Cosmos DB SQL API Azure Cosmos DB SQL APIAzure Cosmos DB SQL API • 提供进度跟踪。• Provides progress tracking.
• 仅当源是 Azure Cosmos DB 容器时才适用。• Works only if the source is an Azure Cosmos DB container.
• 也适用于较大的数据集。• Works for larger datasets as well.
• 要求用户设置一个应用服务来托管更改源处理器。• Requires the user to set up an App Service to host the Change feed processor.
• 不捕获源容器中的删除操作。• Does not capture deletes from the source container.

Azure Cosmos DB Mongo APIAzure Cosmos DB Mongo API

迁移类型Migration type 解决方案Solution 受支持的源Supported sources 支持的目标Supported targets 注意事项Considerations
联机Online Azure 数据库迁移服务Azure Database Migration Service MongoDBMongoDB 用于 MongoDB 的 Azure Cosmos DB APIAzure Cosmos DB API for MongoDB • 利用 Azure Cosmos DB 批量执行工具库。• Makes use of the Azure Cosmos DB bulk executor library.
• 适合用于大型数据集,负责复制实时更改。• Suitable for large datasets and takes care of replicating live changes.
• 仅适用于其他 MongoDB 源。• Works only with other MongoDB sources.
OfflineOffline Azure 数据库迁移服务Azure Database Migration Service MongoDBMongoDB 用于 MongoDB 的 Azure Cosmos DB APIAzure Cosmos DB API for MongoDB • 利用 Azure Cosmos DB 批量执行工具库。• Makes use of the Azure Cosmos DB bulk executor library.
• 适合用于大型数据集,负责复制实时更改。• Suitable for large datasets and takes care of replicating live changes.
• 仅适用于其他 MongoDB 源。• Works only with other MongoDB sources.
OfflineOffline Azure 数据工厂Azure Data Factory •JSON/CSV 文件•JSON/CSV Files
•Azure Cosmos DB SQL API•Azure Cosmos DB SQL API
•用于 MongoDB 的 Azure Cosmos DB API•Azure Cosmos DB API for MongoDB
•MongoDB•MongoDB
•SQL Server•SQL Server
•表存储•Table Storage
•Azure Blob 存储•Azure Blob Storage

有关其他受支持的源,请参阅 Azure 数据工厂一文。See the Azure Data Factory article for other supported sources.
•Azure Cosmos DB SQL API•Azure Cosmos DB SQL API
•用于 MongoDB 的 Azure Cosmos DB API•Azure Cosmos DB API for MongoDB
• JSON 文件• JSON files

有关其他受支持的目标,请参阅 Azure 数据工厂一文。See the Azure Data Factory article for other supported targets.
• 易于设置并支持多个源。• Easy to set up and supports multiple sources.
• 利用 Azure Cosmos DB 批量执行工具库。• Makes use of the Azure Cosmos DB bulk executor library.
• 适合用于大型数据集。• Suitable for large datasets.
• 缺少检查点,这意味着,在迁移过程中出现任何问题都需要重启整个迁移过程。• Lack of checkpointing means that any issue during the course of migration would require a restart of the whole migration process.
• 缺少死信队列,这意味着,几个文件有错误就可能会停止整个迁移过程。• Lack of a dead letter queue would mean that a few erroneous files could stop the entire migration process.
• 需要编写自定义代码来增大某些数据源的读取吞吐量。• Needs custom code to increase read throughput for certain data sources.
OfflineOffline 现有的 Mongo 工具(mongodump、mongorestore、Studio3T)Existing Mongo Tools (mongodump, mongorestore, Studio3T) MongoDBMongoDB 用于 MongoDB 的 Azure Cosmos DB APIAzure Cosmos DB API for MongoDB • 易于设置和集成。• Easy to set up and integration.
• 需要对限制进行自定义处理。• Needs custom handling for throttles.

Azure Cosmos DB Cassandra APIAzure Cosmos DB Cassandra API

迁移类型Migration type 解决方案Solution 受支持的源Supported sources 支持的目标Supported targets 注意事项Considerations
OfflineOffline cqlsh COPY 命令cqlsh COPY command CSV 文件CSV Files Azure Cosmos DB Cassandra APIAzure Cosmos DB Cassandra API • 易于设置。• Easy to set up.
• 不适合用于大型数据集。• Not suitable for large datasets.
• 仅当源是 Cassandra 表时才适用。• Works only when the source is a Cassandra table.
OfflineOffline 用 Spark 复制表Copy table with Spark •Apache Cassandra•Apache Cassandra
•Azure Cosmos DB Cassandra API•Azure Cosmos DB Cassandra API
Azure Cosmos DB Cassandra APIAzure Cosmos DB Cassandra API • 可以利用 Spark 功能将转换和引入并行化。• Can make use of Spark capabilities to parallelize transformation and ingestion.
• 需要配置自定义重试策略来处理限制。• Needs configuration with a custom retry policy to handle throttles.
联机Online Blitzz(来自 Oracle DB/Apache Cassandra)Blitzz (from Oracle DB/Apache Cassandra) •Oracle•Oracle
•Apache Cassandra•Apache Cassandra

有关其他受支持的源,请参阅 Blitzz 网站See the Blitzz website for other supported sources.
Azure Cosmos DB Cassandra API。Azure Cosmos DB Cassandra API.

有关其他受支持的目标,请参阅 Blitzz 网站See the Blitzz website for other supported targets.
• 支持较大的数据集。• Supports larger datasets.
• 由于这是一个第三方工具,因此需要从市场购买并将其安装在用户环境中。• Since this is a third-party tool, it needs to be purchased from the marketplace and installed in the user's environment.

其他 APIOther APIs

对于除 SQL API、Mongo API 和 Cassandra API 以外的 API,每个 API 的现有生态系统支持各种工具。For APIs other than the SQL API, Mongo API and the Cassandra API, there are various tools supported by each of the API's existing ecosystems.

表 APITable API

Gremlin APIGremlin API

后续步骤Next steps

  • 若要进行详细了解,请试用那些在 .NETJava 中使用批量执行程序库的示例应用程序。Learn more by trying out the sample applications consuming the bulk executor library in .NET and Java.
  • 批量执行程序库已集成到 Cosmos DB Spark 连接器中。若要进行详细的了解,请参阅 Azure Cosmos DB Spark 连接器一文。The bulk executor library is integrated into the Cosmos DB Spark connector, to learn more, see Azure Cosmos DB Spark connector article.
  • 如需大规模迁移方面的更多帮助,请通过开具支持票证来联系 Azure Cosmos DB 产品团队:选择“常规建议”问题类型,“大规模迁移(TB+)”问题子类型。Contact the Azure Cosmos DB product team by opening a support ticket under the "General Advisory" problem type and "Large (TB+) migrations" problem subtype for additional help with large scale migrations.