使用数据迁移工具将数据迁移到 Azure Cosmos DBUse Data migration tool to migrate your data to Azure Cosmos DB

本教程说明如何使用可将数据从各种源导入 Azure Cosmos DB 集合和表的 Azure Cosmos DB 数据迁移工具。This tutorial provides instructions on using the Azure Cosmos DB Data Migration tool, which can import data from various sources into Azure Cosmos DB collections and tables. 可从 JSON 文件、CSV 文件、SQL、MongoDB、Azure 表存储、Amazon DynamoDB 导入,甚至还可以从 Azure Cosmos DB SQL API 集合导入。You can import from JSON files, CSV files, SQL, MongoDB, Azure Table storage, Amazon DynamoDB, and even Azure Cosmos DB SQL API collections. 可将该数据迁移到集合和表中,以便在 Azure Cosmos DB 中使用。You migrate that data to collections and tables for use with Azure Cosmos DB. 还可在从单个分区集合迁移到 SQL API 的多分区集合时使用数据迁移工具。The Data Migration tool can also be used when migrating from a single partition collection to a multi-partition collection for the SQL API.

要对 Azure Cosmos DB 使用哪个 API?Which API are you going to use with Azure Cosmos DB?

本教程涵盖以下任务:This tutorial covers the following tasks:

  • 安装数据迁移工具Installing the Data Migration tool
  • 从不同的数据源导入数据Importing data from different data sources
  • 将 Azure Cosmos DB 导出到 JSONExporting from Azure Cosmos DB to JSON

先决条件Prerequisites

在按照本文中的说明操作之前,请确保执行以下步骤:Before following the instructions in this article, ensure that you do the following steps:

  • 安装 Microsoft .NET Framework 4.5.1 或更高版本。Install Microsoft .NET Framework 4.51 or higher.

  • 增加吞吐量: 数据迁移的持续时间取决于为单个集合或一组集合设置的吞吐量。Increase throughput: The duration of your data migration depends on the amount of throughput you set up for an individual collection or a set of collections. 请确保对于较大的数据迁移增加吞吐量。Be sure to increase the throughput for larger data migrations. 完成迁移后,减少吞吐量以节约成本。After you've completed the migration, decrease the throughput to save costs. 有关在 Azure 门户中增加吞吐量的详细信息,请参阅 Azure Cosmos DB 中的性能级别定价层For more information about increasing throughput in the Azure portal, see performance levels and pricing tiers in Azure Cosmos DB.

  • 创建 Azure Cosmos DB 资源: 在开始迁移数据之前,从 Azure 门户预先创建所有集合。Create Azure Cosmos DB resources: Before you start the migrating data, pre-create all your collections from the Azure portal. 若要迁移到具有数据库级别吞吐量的 Azure Cosmos DB 帐户,请在创建 Azure Cosmos DB 集合时提供分区键。To migrate to an Azure Cosmos DB account that has database level throughput, provide a partition key when you create the Azure Cosmos DB collections.

概述Overview

数据迁移工具是一种开源解决方案,可将数据从各种源导入 Azure Cosmos DB 中,这些源包括:The Data Migration tool is an open-source solution that imports data to Azure Cosmos DB from a variety of sources, including:

  • JSON 文件JSON files
  • MongoDBMongoDB
  • SQL ServerSQL Server
  • CSV 文件CSV files
  • Azure 表存储Azure Table storage
  • Amazon DynamoDBAmazon DynamoDB
  • HBaseHBase
  • Azure Cosmos DB 集合Azure Cosmos DB collections

虽然导入工具包括图形用户界面 (dtui.exe),但是也可从命令行 (dt.exe) 中驱动。While the import tool includes a graphical user interface (dtui.exe), it can also be driven from the command-line (dt.exe). 实际上,有一个选项可以在通过 UI 设置导入后输出关联的命令。In fact, there's an option to output the associated command after setting up an import through the UI. 可以转换表格源数据(例如 SQL Server 或 CSV 文件),以便在导入过程中创建层次结构关系(子文档)。You can transform tabular source data, such as SQL Server or CSV files, to create hierarchical relationships (subdocuments) during import. 继续阅读,以了解有关源选项、用于从每个源导入的示例命令以及目标选项的详细信息,并查看导入结果。Keep reading to learn more about source options, sample commands to import from each source, target options, and viewing import results.

安装Installation

迁移工具源代码可在 GitHub 上的此存储库中获得。The migration tool source code is available on GitHub in this repository. 可以在本地下载并编译解决方案,或者下载一个预编译的库,然后运行以下任一项:You can download and compile the solution locally, or download a pre-compiled binary, then run either:

  • Dtui.exe:该工具的图形界面版本Dtui.exe: Graphical interface version of the tool
  • Dt.exe:该工具的命令行版本Dt.exe: Command-line version of the tool

选择数据源Select data source

安装该工具之后,就可以导入数据。Once you've installed the tool, it's time to import your data. 希望导入什么类型的数据?What kind of data do you want to import?

导入 JSON 文件Import JSON files

借助 JSON 文件源导入程序选项,可以导入一个或多个只包含单个文档的 JSON 文件或者每个都包含一组 JSON 文档的 JSON 文件。The JSON file source importer option allows you to import one or more single document JSON files or JSON files that each have an array of JSON documents. 添加包含 JSON 文件的文件夹以供导入时,可以选择递归搜索子文件夹中的文件。When adding folders that have JSON files to import, you have the option of recursively searching for files in subfolders.

JSON 文件源选项的屏幕截图 - 数据库迁移工具

连接字符串采用以下格式:The connection string is in the following format:

AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>

  • <CosmosDB Endpoint> 是终结点 URI。The <CosmosDB Endpoint> is the endpoint URI. 可从 Azure 门户获取此值。You can get this value from the Azure portal. 导航到 Azure Cosmos 帐户。Navigate to your Azure Cosmos account. 打开“概述”窗格并复制 URI 值 。Open the Overview pane and copy the URI value.
  • <AccountKey> 是“密码”或“主要密钥” 。The <AccountKey> is the "Password" or PRIMARY KEY. 可从 Azure 门户获取此值。You can get this value from the Azure portal. 导航到 Azure Cosmos 帐户。Navigate to your Azure Cosmos account. 打开“连接字符串”或“密钥”窗格,然后复制“密码”或“主要密钥”值 。Open the Connection Strings or Keys pane, and copy the "Password" or PRIMARY KEY value.
  • <CosmosDB Database> 是 CosmosDB 数据库名称。The <CosmosDB Database> is the CosmosDB database name.

示例: AccountEndpoint=https://myCosmosDBName.documents.azure.cn:443/;AccountKey=wJmFRYna6ttQ79ATmrTMKql8vPri84QBiHTt6oinFkZRvoe7Vv81x9sn6zlVlBY10bEPMgGM982wfYXpWXWB9w==;Database=myDatabaseNameExample: AccountEndpoint=https://myCosmosDBName.documents.azure.cn:443/;AccountKey=wJmFRYna6ttQ79ATmrTMKql8vPri84QBiHTt6oinFkZRvoe7Vv81x9sn6zlVlBY10bEPMgGM982wfYXpWXWB9w==;Database=myDatabaseName

Note

使用“验证”命令来确保可以访问在连接字符串字段中指定的 Cosmos DB 帐户。Use the Verify command to ensure that the Cosmos DB account specified in the connection string field can be accessed.

下面是一些导入 JSON 文件的命令行示例:Here are some command-line samples to import JSON files:

#Import a single JSON file
dt.exe /s:JsonFile /s.Files:.\Sessions.json /t:DocumentDBBulk /t.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:Sessions /t.CollectionThroughput:2500

#Import a directory of JSON files
dt.exe /s:JsonFile /s.Files:C:\TESessions\*.json /t:DocumentDBBulk /t.ConnectionString:" AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:Sessions /t.CollectionThroughput:2500

#Import a directory (including sub-directories) of JSON files
dt.exe /s:JsonFile /s.Files:C:\LastFMMusic\**\*.json /t:DocumentDBBulk /t.ConnectionString:" AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:Music /t.CollectionThroughput:2500

#Import a directory (single), directory (recursive), and individual JSON files
dt.exe /s:JsonFile /s.Files:C:\Tweets\*.*;C:\LargeDocs\**\*.*;C:\TESessions\Session48172.json;C:\TESessions\Session48173.json;C:\TESessions\Session48174.json;C:\TESessions\Session48175.json;C:\TESessions\Session48177.json /t:DocumentDBBulk /t.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:subs /t.CollectionThroughput:2500

#Import a single JSON file and partition the data across 4 collections
dt.exe /s:JsonFile /s.Files:D:\\CompanyData\\Companies.json /t:DocumentDBBulk /t.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:comp[1-4] /t.PartitionKey:name /t.CollectionThroughput:2500

从 MongoDB 导入Import from MongoDB

Important

如果要导入到使用 Azure Cosmos DB 的用于 MongoDB 的 API 配置的 Cosmos 帐户,请遵照这些说明进行操作。If you're importing to a Cosmos account configured with Azure Cosmos DB's API for MongoDB, follow these instructions.

借助 MongoDB 源导入程序选项,可从单个 MongoDB 集合中导入,并且选择使用查询筛选文档,以及使用投影来修改文档结构。With the MongoDB source importer option, you can import from a single MongoDB collection, optionally filter documents using a query, and modify the document structure by using a projection.

MongoDB 源选项的屏幕截图

连接字符串是标准 MongoDB 格式:The connection string is in the standard MongoDB format:

mongodb://<dbuser>:<dbpassword>@<host>:<port>/<database>

Note

使用验证命令来确保可以访问在连接字符串字段中指定的 MongoDB 实例。Use the Verify command to ensure that the MongoDB instance specified in the connection string field can be accessed.

输入将从其中导入数据的集合的名称。Enter the name of the collection from which data will be imported. 可以选择为查询(例如 {pop: {$gt:5000}})或投影(例如 {loc:0})指定或提供一个文件来筛选和形成要导入的数据。You may optionally specify or provide a file for a query, such as {pop: {$gt:5000}}, or a projection, such as {loc:0}, to both filter and shape the data that you're importing.

下面是一些从 MongoDB 中导入的命令行示例:Here are some command-line samples to import from MongoDB:

#Import all documents from a MongoDB collection
dt.exe /s:MongoDB /s.ConnectionString:mongodb://<dbuser>:<dbpassword>@<host>:<port>/<database> /s.Collection:zips /t:DocumentDBBulk /t.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:BulkZips /t.IdField:_id /t.CollectionThroughput:2500

#Import documents from a MongoDB collection which match the query and exclude the loc field
dt.exe /s:MongoDB /s.ConnectionString:mongodb://<dbuser>:<dbpassword>@<host>:<port>/<database> /s.Collection:zips /s.Query:{pop:{$gt:50000}} /s.Projection:{loc:0} /t:DocumentDBBulk /t.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:BulkZipsTransform /t.IdField:_id/t.CollectionThroughput:2500

导入 MongoDB 导出文件Import MongoDB export files

Important

如果要导入到支持 MongoDB 的 Azure Cosmos DB 帐户,请遵照这些说明操作。If you're importing to an Azure Cosmos DB account with support for MongoDB, follow these instructions.

借助 MongoDB 导出 JSON 文件源导入程序选项,可以导入一个或多个通过 mongoexport 实用工具生成的 JSON 文件。The MongoDB export JSON file source importer option allows you to import one or more JSON files produced from the mongoexport utility.

MongoDB 导出源选项的屏幕截图

添加包含 MongoDB 导出 JSON 文件的文件夹以供导入时,可以选择递归搜索子文件夹中的文件。When adding folders that have MongoDB export JSON files for import, you have the option of recursively searching for files in subfolders.

下面是一个用于从 MongoDB 导出 JSON 文件中导入的命令行示例:Here is a command-line sample to import from MongoDB export JSON files:

dt.exe /s:MongoDBExport /s.Files:D:\mongoemployees.json /t:DocumentDBBulk /t.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:employees /t.IdField:_id /t.Dates:Epoch /t.CollectionThroughput:2500

从 SQL Server 导入Import from SQL Server

借助 SQL 源导入程序选项,可以从单个 SQL Server 数据库中导入,并且选择筛选要使用查询来导入的记录。The SQL source importer option allows you to import from an individual SQL Server database and optionally filter the records to be imported using a query. 此外,可以通过指定嵌套分隔符(稍后进行详细介绍)来修改文档结构。In addition, you can modify the document structure by specifying a nesting separator (more on that in a moment).

SQL 源选项的屏幕截图 - 数据库迁移工具

连接字符串的格式是标准 SQL 连接字符串格式。The format of the connection string is the standard SQL connection string format.

Note

使用验证命令来确保可以访问在连接字符串字段中指定的 SQL Server 实例。Use the Verify command to ensure that the SQL Server instance specified in the connection string field can be accessed.

嵌套分隔符属性用于在导入过程中创建层次结构关系(子文档)。The nesting separator property is used to create hierarchical relationships (sub-documents) during import. 请考虑下列 SQL 查询:Consider the following SQL query:

select CAST(BusinessEntityID AS varchar) as Id, Name, AddressType as [Address.AddressType], AddressLine1 as [Address.AddressLine1], City as [Address.Location.City], StateProvinceName as [Address.Location.StateProvinceName], PostalCode as [Address.PostalCode], CountryRegionName as [Address.CountryRegionName] from Sales.vStoreWithAddresses WHERE AddressType='Main Office'

它将返回以下(部分)结果:Which returns the following (partial) results:

SQL 查询结果的屏幕截图

请注意 Address.AddressType 和 Address.Location.StateProvinceName 等别名。Note the aliases such as Address.AddressType and Address.Location.StateProvinceName. 通过指定嵌套分隔符“.”,导入工具会在导入过程中创建 Address 和 Address.Location 子文档。By specifying a nesting separator of '.', the import tool creates Address and Address.Location subdocuments during the import. 下面是在 Azure Cosmos DB 中生成文档的示例:Here is an example of a resulting document in Azure Cosmos DB:

{ "id":"956", "Name":"Finer Sales and Service", "Address": { "AddressType":"Main Office", "AddressLine1": "#500-75 O'Connor Street", "Location": { "City":"Ottawa", "StateProvinceName":"Ontario" }, "PostalCode":"K4B 1S2", "CountryRegionName":"Canada" } }{ "id": "956", "Name": "Finer Sales and Service", "Address": { "AddressType": "Main Office", "AddressLine1": "#500-75 O'Connor Street", "Location": { "City": "Ottawa", "StateProvinceName": "Ontario" }, "PostalCode": "K4B 1S2", "CountryRegionName": "Canada" } }

下面是一些从 SQL Server 中导入的命令行示例:Here are some command-line samples to import from SQL Server:

#Import records from SQL which match a query
dt.exe /s:SQL /s.ConnectionString:"Data Source=<server>;Initial Catalog=AdventureWorks;User Id=advworks;Password=<password>;" /s.Query:"select CAST(BusinessEntityID AS varchar) as Id, * from Sales.vStoreWithAddresses WHERE AddressType='Main Office'" /t:DocumentDBBulk /t.ConnectionString:" AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:Stores /t.IdField:Id /t.CollectionThroughput:2500

#Import records from sql which match a query and create hierarchical relationships
dt.exe /s:SQL /s.ConnectionString:"Data Source=<server>;Initial Catalog=AdventureWorks;User Id=advworks;Password=<password>;" /s.Query:"select CAST(BusinessEntityID AS varchar) as Id, Name, AddressType as [Address.AddressType], AddressLine1 as [Address.AddressLine1], City as [Address.Location.City], StateProvinceName as [Address.Location.StateProvinceName], PostalCode as [Address.PostalCode], CountryRegionName as [Address.CountryRegionName] from Sales.vStoreWithAddresses WHERE AddressType='Main Office'" /s.NestingSeparator:. /t:DocumentDBBulk /t.ConnectionString:" AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:StoresSub /t.IdField:Id /t.CollectionThroughput:2500

导入 CSV 文件并将 CSV 转换为 JSONImport CSV files and convert CSV to JSON

CSV 文件源导入程序选项可用于导入一个或多个 CSV 文件。The CSV file source importer option enables you to import one or more CSV files. 添加包含 CSV 文件的文件夹以供导入时,可以选择递归搜索子文件夹中的文件。When adding folders that have CSV files for import, you have the option of recursively searching for files in subfolders.

CSV 源选项的屏幕截图 - CSV 到 JSON

与 SQL 源相似,嵌套分隔符属性可用于在导入过程中创建层次结构关系(子文档)。Similar to the SQL source, the nesting separator property may be used to create hierarchical relationships (sub-documents) during import. 请考虑以下 CSV 标头行和数据行︰Consider the following CSV header row and data rows:

CSV 示例记录的屏幕截图 - CSV 到 JSON

请注意 DomainInfo.Domain_Name 和 RedirectInfo.Redirecting 等别名。Note the aliases such as DomainInfo.Domain_Name and RedirectInfo.Redirecting. 通过指定嵌套分隔符“.”,导入工具会在导入过程中创建 DomainInfo 和 RedirectInfo 子文档。By specifying a nesting separator of '.', the import tool will create DomainInfo and RedirectInfo subdocuments during the import. 下面是在 Azure Cosmos DB 中生成文档的示例:Here is an example of a resulting document in Azure Cosmos DB:

{ "DomainInfo": { "Domain_Name":"ACUS.GOV", "Domain_Name_Address": "https://www.ACUS.GOV" }, "Federal Agency":"Administrative Conference of the United States", "RedirectInfo": { "Redirecting":"0", "Redirect_Destination": "" }, "id":"9cc565c5-ebcd-1c03-ebd3-cc3e2ecd814d" }{ "DomainInfo": { "Domain_Name": "ACUS.GOV", "Domain_Name_Address": "https://www.ACUS.GOV" }, "Federal Agency": "Administrative Conference of the United States", "RedirectInfo": { "Redirecting": "0", "Redirect_Destination": "" }, "id": "9cc565c5-ebcd-1c03-ebd3-cc3e2ecd814d" }

导入工具会尝试针对 CSV 文件中不带引号的值推断类型信息(带引号的值始终作为字符串处理)。The import tool tries to infer type information for unquoted values in CSV files (quoted values are always treated as strings). 按以下顺序标识类型︰数值、日期时间、布尔值。Types are identified in the following order: number, datetime, boolean.

关于 CSV 导入,还需要注意两个事项︰There are two other things to note about CSV import:

  1. 默认情况下,不带引号的值总是会去掉制表符和空格,而带引号的值将保持原样。By default, unquoted values are always trimmed for tabs and spaces, while quoted values are preserved as-is. 通过使用“剪裁带引号的值”复选框或 /s.TrimQuoted 命令行选项,可以重写此行为。This behavior can be overridden with the Trim quoted values checkbox or the /s.TrimQuoted command-line option.
  2. 默认情况下,不带引号的 null 视为 null 值。By default, an unquoted null is treated as a null value. 通过使用“将不带引号的 NULL 作为字符串处理”复选框或 /s.NoUnquotedNulls 命令行选项,可以重写此行为(即,将不带引号的 null 视为“null”字符串)。This behavior can be overridden (that is, treat an unquoted null as a "null" string) with the Treat unquoted NULL as string checkbox or the /s.NoUnquotedNulls command-line option.

下面是 CSV 导入的命令行示例:Here is a command-line sample for CSV import:

dt.exe /s:CsvFile /s.Files:.\Employees.csv /t:DocumentDBBulk /t.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:Employees /t.IdField:EntityID /t.CollectionThroughput:2500

从 Azure 表存储导入Import from Azure Table storage

借助 Azure 表存储源导入程序选项,可以从单个 Azure 表存储表导入。The Azure Table storage source importer option allows you to import from an individual Azure Table storage table. 可以选择性地筛选要导入的表实体。Optionally, you can filter the table entities to be imported.

从 Azure 表存储导入的数据可输出到 Azure Cosmos DB 表和实体,以便与表 API 配合使用。You may output data that was imported from Azure Table Storage to Azure Cosmos DB tables and entities for use with the Table API. 导入的数据还可输出到集合和文档,以便与 SQL API 配合使用。Imported data can also be output to collections and documents for use with the SQL API. 但是,表 API 只能在命令行实用工具中用作目标。However, Table API is only available as a target in the command-line utility. 无法使用数据迁移工具用户界面导出到表 API。You can't export to Table API by using the Data Migration tool user interface. 有关详细信息,请参阅导入要在 Azure Cosmos DB 表 API 中使用的数据For more information, see Import data for use with the Azure Cosmos DB Table API.

Azure 表存储源选项的屏幕截图

Azure 表存储连接字符串的格式为:The format of the Azure Table storage connection string is:

DefaultEndpointsProtocol=<protocol>;AccountName=<Account Name>;AccountKey=<Account Key>;EndpointSuffix=core.chinacloudapi.cn;

Note

使用验证命令来确保可以访问在连接字符串字段中指定的 Azure 表存储实例。Use the Verify command to ensure that the Azure Table storage instance specified in the connection string field can be accessed.

输入要从其中导入数据的 Azure 表的名称。Enter the name of the Azure table from to import from. 可以选择指定筛选器You may optionally specify a filter.

Azure 表存储源导入程序选项具有下列附加选项︰The Azure Table storage source importer option has the following additional options:

  1. 包括内部字段Include Internal Fields
    1. 所有 - 包括所有内部字段(PartitionKey、RowKey 和 Timestamp)All - Include all internal fields (PartitionKey, RowKey, and Timestamp)
    2. 无 - 排除所有内部字段None - Exclude all internal fields
    3. RowKey - 仅包括 RowKey 字段RowKey - Only include the RowKey field
  2. 选择列Select Columns
    1. Azure 表存储筛选器不支持投影。Azure Table storage filters don't support projections. 如果想要仅导入特定的 Azure 表实体属性,请将它们添加到“选择列”列表中。If you want to only import specific Azure Table entity properties, add them to the Select Columns list. 将忽略其他所有实体属性。All other entity properties are ignored.

下面是一个用于从 Azure 表存储导入的命令行示例:Here is a command-line sample to import from Azure Table storage:

dt.exe /s:AzureTable /s.ConnectionString:"DefaultEndpointsProtocol=https;AccountName=<Account Name>;AccountKey=<Account Key>;EndpointSuffix=core.chinacloudapi.cn" /s.Table:metrics /s.InternalFields:All /s.Filter:"PartitionKey eq 'Partition1' and RowKey gt '00001'" /s.Projection:ObjectCount;ObjectSize  /t:DocumentDBBulk /t.ConnectionString:" AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:metrics /t.CollectionThroughput:2500

从 Amazon DynamoDB 导入Import from Amazon DynamoDB

借助 Amazon DynamoDB 源导入程序选项,可以从单个 Amazon DynamoDB 表导入数据。The Amazon DynamoDB source importer option allows you to import from a single Amazon DynamoDB table. 此选项可以选择性地筛选要导入的实体。It can optionally filter the entities to be imported. 提供多个模板,以便尽可能简化导入设置。Several templates are provided so that setting up an import is as easy as possible.

Amazon DynamoDB 源选项的屏幕截图 - 数据库迁移工具

Amazon DynamoDB 源选项的屏幕截图 - 数据库迁移工具

Amazon DynamoDB 连接字符串的格式为:The format of the Amazon DynamoDB connection string is:

ServiceURL=<Service Address>;AccessKey=<Access Key>;SecretKey=<Secret Key>;

Note

使用验证命令来确保可以访问在连接字符串字段中指定的 Amazon DynamoDB 实例。Use the Verify command to ensure that the Amazon DynamoDB instance specified in the connection string field can be accessed.

下面是一个用于从 Amazon DynamoDB 导入的命令行示例:Here is a command-line sample to import from Amazon DynamoDB:

dt.exe /s:DynamoDB /s.ConnectionString:ServiceURL=https://dynamodb.us-east-1.amazonaws.com;AccessKey=<accessKey>;SecretKey=<secretKey> /s.Request:"{   """TableName""": """ProductCatalog""" }" /t:DocumentDBBulk /t.ConnectionString:"AccountEndpoint=<Azure Cosmos DB Endpoint>;AccountKey=<Azure Cosmos DB Key>;Database=<Azure Cosmos DB Database>;" /t.Collection:catalogCollection /t.CollectionThroughput:2500

从 Azure Blob 存储导入Import from Azure Blob storage

借助 JSON 文件、MongoDB 导出文件和 CSV 文件源导入程序选项,可以从 Azure Blob 存储中导入一个或多个文件。The JSON file, MongoDB export file, and CSV file source importer options allow you to import one or more files from Azure Blob storage. 在指定 Blob 容器 URL 和帐户密钥后,提供正则表达式来选择要导入的文件。After specifying a Blob container URL and Account Key, provide a regular expression to select the file(s) to import.

Blob 文件源选项的屏幕截图

下面是一个用于从 Azure Blob 存储导入 JSON 文件的命令行示例:Here is command-line sample to import JSON files from Azure Blob storage:

dt.exe /s:JsonFile /s.Files:"blobs://<account key>@account.blob.core.chinacloudapi.cn:443/importcontainer/.*" /t:DocumentDBBulk /t.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:doctest

从 SQL API 集合导入Import from a SQL API collection

使用 Azure Cosmos DB 源导入程序选项,可以从一个或多个 Azure Cosmos DB 集合中导入数据,还可选择性地使用查询来筛选文档。The Azure Cosmos DB source importer option allows you to import data from one or more Azure Cosmos DB collections and optionally filter documents using a query.

Azure Cosmos DB 源选项的屏幕截图

Azure Cosmos DB 连接字符串的格式为:The format of the Azure Cosmos DB connection string is:

AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;

可根据如何管理 Azure Cosmos DB 帐户中所述,从 Azure 门户的“密钥”页检索 Azure Cosmos DB 帐户连接字符串。You can retrieve the Azure Cosmos DB account connection string from the Keys page of the Azure portal, as described in How to manage an Azure Cosmos DB account. 但是,需将数据库的名称追加到采用以下格式的连接字符串后面:However, the name of the database needs to be appended to the connection string in the following format:

Database=<CosmosDB Database>;

Note

使用“验证”命令来确保可以访问在连接字符串字段中指定的 Azure Cosmos DB 实例。Use the Verify command to ensure that the Azure Cosmos DB instance specified in the connection string field can be accessed.

若要从单个 Azure Cosmos DB 集合导入,请输入要从其中导入数据的集合的名称。To import from a single Azure Cosmos DB collection, enter the name of the collection to import data from. 若要从多个 Azure Cosmos DB 集合导入,请提供与一个或多个集合名称相匹配的正则表达式(例如 collection01 | collection02 | collection03)。To import from more than one Azure Cosmos DB collection, provide a regular expression to match one or more collection names (for example, collection01 | collection02 | collection03). 可以选择为查询指定或提供一个文件来筛选和形成要导入的数据。You may optionally specify, or provide a file for, a query to both filter and shape the data that you're importing.

Note

由于集合字段接受正则表达式,因此如果要从名称包含正则表达式字符的单个集合中导入,则必须相应地转义这些字符。Since the collection field accepts regular expressions, if you're importing from a single collection whose name has regular expression characters, then those characters must be escaped accordingly.

Azure Cosmos DB 源导入程序选项具有下列高级选项:The Azure Cosmos DB source importer option has the following advanced options:

  1. 包括内部字段:指定是否在导出中包括 Azure Cosmos DB 文档系统属性(例如 _rid、_ts)。Include Internal Fields: Specifies whether or not to include Azure Cosmos DB document system properties in the export (for example, _rid, _ts).
  2. 失败重试次数:指定在发生暂时性故障(例如网络连接中断)时重试 Azure Cosmos DB 连接的次数。Number of Retries on Failure: Specifies the number of times to retry the connection to Azure Cosmos DB in case of transient failures (for example, network connectivity interruption).
  3. 重试间隔:指定在发生暂时性故障(例如网络连接中断)时重试 Azure Cosmos DB 连接等待的时间间隔。Retry Interval: Specifies how long to wait between retrying the connection to Azure Cosmos DB in case of transient failures (for example, network connectivity interruption).
  4. 连接模式:指定要用于 Azure Cosmos DB 的连接模式。Connection Mode: Specifies the connection mode to use with Azure Cosmos DB. 可用选项包括 DirectTcp、DirectHttps 和网关。The available choices are DirectTcp, DirectHttps, and Gateway. 直接连接模式速度更快,而网关模式对于防火墙更加友好,因为它仅使用端口 443。The direct connection modes are faster, while the gateway mode is more firewall friendly as it only uses port 443.

Azure Cosmos DB 源高级选项的屏幕截图

Tip

导入工具默认设置为 DirectTcp 连接模式。The import tool defaults to connection mode DirectTcp. 如果遇到防火墙问题,请切换到网关连接模式,因为它只需要端口 443。If you experience firewall issues, switch to connection mode Gateway, as it only requires port 443.

下面是一些从 Azure Cosmos DB 导入的命令行示例:Here are some command-line samples to import from Azure Cosmos DB:

#Migrate data from one Azure Cosmos DB collection to another Azure Cosmos DB collections
dt.exe /s:DocumentDB /s.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /s.Collection:TEColl /t:DocumentDBBulk /t.ConnectionString:" AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:TESessions /t.CollectionThroughput:2500

#Migrate data from more than one Azure Cosmos DB collection to a single Azure Cosmos DB collection
dt.exe /s:DocumentDB /s.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /s.Collection:comp1|comp2|comp3|comp4 /t:DocumentDBBulk /t.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:singleCollection /t.CollectionThroughput:2500

#Export an Azure Cosmos DB collection to a JSON file
dt.exe /s:DocumentDB /s.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /s.Collection:StoresSub /t:JsonFile /t.File:StoresExport.json /t.Overwrite /t.CollectionThroughput:2500

Tip

Azure Cosmos DB 数据导入工具还支持从 Azure Cosmos DB 模拟器导入数据。The Azure Cosmos DB Data Import Tool also supports import of data from the Azure Cosmos DB Emulator. 从本地模拟器导入数据时,请将终结点设置为 https://localhost:<port>When importing data from a local emulator, set the endpoint to https://localhost:<port>.

从 HBase 导入Import from HBase

借助 HBase 源导入程序选项,可以从 HBase 表中导入数据并且选择筛选数据。The HBase source importer option allows you to import data from an HBase table and optionally filter the data. 提供多个模板,以便尽可能简化导入设置。Several templates are provided so that setting up an import is as easy as possible.

HBase 源选项的屏幕截图

HBase 源选项的屏幕截图

HBase Stargate 连接字符串的格式为︰The format of the HBase Stargate connection string is:

ServiceURL=<server-address>;Username=<username>;Password=<password>

Note

使用验证命令来确保可以访问在连接字符串字段中指定的 HBase 实例。Use the Verify command to ensure that the HBase instance specified in the connection string field can be accessed.

下面是一个用于从 HBase 导入的命令行示例:Here is a command-line sample to import from HBase:

dt.exe /s:HBase /s.ConnectionString:ServiceURL=<server-address>;Username=<username>;Password=<password> /s.Table:Contacts /t:DocumentDBBulk /t.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:hbaseimport

导入到 SQL API(批量导入)Import to the SQL API (Bulk Import)

借助 Azure Cosmos DB 批量导入程序,可以使用 Azure Cosmos DB 存储的过程从所有可用的源选项导入,以提高效率。The Azure Cosmos DB Bulk importer allows you to import from any of the available source options, using an Azure Cosmos DB stored procedure for efficiency. 该工具支持导入到一个单分区 Azure Cosmos DB 集合。The tool supports import to one single-partitioned Azure Cosmos DB collection. 它还支持分片导入,通过这种方法可跨多个单分区 Azure Cosmos DB 集合对数据进行分区。It also supports sharded import whereby data is partitioned across more than one single-partitioned Azure Cosmos DB collection. 有关将数据分区的详细信息,请参阅 Azure Cosmos DB 中的分区和扩展For more information about partitioning data, see Partitioning and scaling in Azure Cosmos DB. 该工具将在目标集合中创建、执行然后删除存储过程。The tool creates, executes, and then deletes the stored procedure from the target collection(s).

Azure Cosmos DB 批量选项的屏幕截图

Azure Cosmos DB 连接字符串的格式为:The format of the Azure Cosmos DB connection string is:

AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;

Azure Cosmos DB 帐户连接字符串可从 Azure 门户的“密钥”页中检索到(如如何管理 Azure Cosmos DB 帐户中所述),但是需要将数据库的名称追加到连接字符串后面(格式如下):The Azure Cosmos DB account connection string can be retrieved from the Keys page of the Azure portal, as described in How to manage an Azure Cosmos DB account, however the name of the database needs to be appended to the connection string in the following format:

Database=<CosmosDB Database>;

Note

使用“验证”命令来确保可以访问在连接字符串字段中指定的 Azure Cosmos DB 实例。Use the Verify command to ensure that the Azure Cosmos DB instance specified in the connection string field can be accessed.

若要导入到单个集合,请输入要将数据导入到的集合的名称,然后单击“添加”按钮。To import to a single collection, enter the name of the collection to import data from and click the Add button. 若要导入到多个集合,请分别输入每个集合名称,或使用以下语法指定多个集合:collection_prefix[开始索引 - 结束索引]。To import to more than one collection, either enter each collection name individually or use the following syntax to specify more than one collection: collection_prefix[start index - end index]. 使用前述语法指定多个集合时,请注意以下指导原则:When specifying more than one collection using the aforementioned syntax, keep the following guidelines in mind:

  1. 仅支持整数范围名称模式。Only integer range name patterns are supported. 例如,指定 collection[0-3] 会创建以下集合:collection0、collection1、collection2 和 collection3。For example, specifying collection[0-3] creates the following collections: collection0, collection1, collection2, collection3.
  2. 可以使用缩写的语法:collection[3] 创建步骤 1 中所述的同一组集合。You can use an abbreviated syntax: collection[3] creates the same set of collections mentioned in step 1.
  3. 可以提供多个替代。More than one substitution can be provided. 例如,collection[0-1] [0-9] 会生成 20 个带前导零的集合名称(collection01、...02、...03)。For example, collection[0-1] [0-9] generates 20 collection names with leading zeros (collection01, ..02, ..03).

指定集合名称后,选择集合所需的吞吐量(400 RU 到 10,000 RU)。Once the collection name(s) have been specified, choose the desired throughput of the collection(s) (400 RUs to 10,000 RUs). 为了获得最佳导入性能,请选择更高的吞吐量。For best import performance, choose a higher throughput. 有关性能级别的详细信息,请参阅 Azure Cosmos DB 中的性能级别For more information about performance levels, see Performance levels in Azure Cosmos DB.

Note

性能吞吐量设置仅适用于集合创建。The performance throughput setting only applies to collection creation. 如果指定的集合已存在,则不会修改其吞吐量。If the specified collection already exists, its throughput won't be modified.

导入到多个集合时,导入工具支持基于哈希的分片。When you import to more than one collection, the import tool supports hash-based sharding. 在此方案中,请指定要用作分区键的文档属性。In this scenario, specify the document property you wish to use as the Partition Key. (如果分区键留空,文档将跨多个目标集合随机分片。)(If Partition Key is left blank, documents are sharded randomly across the target collections.)

在导入过程中,可以选择指定要将导入源中的哪个字段用作 Azure Cosmos DB 文档 ID 属性。You may optionally specify which field in the import source should be used as the Azure Cosmos DB document ID property during the import. 如果文档不包含此属性,导入工具会生成 GUID 作为 ID 属性值。If documents don't have this property, then the import tool generates a GUID as the ID property value.

导入过程中可以使用多个高级选项。There are a number of advanced options available during import. 首先,虽然工具包含默认的批量导入存储过程 (BulkInsert.js),但可以选择指定自己的导入存储过程︰First, while the tool includes a default bulk import stored procedure (BulkInsert.js), you may choose to specify your own import stored procedure:

Azure Cosmos DB 批量插入 sproc 选项的屏幕截图

此外,在导入日期类型时(例如从 SQL Server 或 MongoDB 导入),可以选择三个导入选项之一:Additionally, when importing date types (for example, from SQL Server or MongoDB), you can choose between three import options:

Azure Cosmos DB 日期时间导入选项的屏幕截图

  • 字符串:保持字符串值String: Persist as a string value
  • Epoch:保持 Epoch 数字值Epoch: Persist as an Epoch number value
  • 两者:保持字符串和 Epoch 数字值。Both: Persist both string and Epoch number values. 此选项创建一个子文档,例如:"date_joined": { "Value":"2013-10-21T21:17:25.2410000Z", "Epoch":1382390245 }This option creates a subdocument, for example: "date_joined": { "Value": "2013-10-21T21:17:25.2410000Z", "Epoch": 1382390245 }

Azure Cosmos DB 批量导入程序具有下列高级附加选项:The Azure Cosmos DB Bulk importer has the following additional advanced options:

  1. 批大小:工具默认将批大小设置为 50。Batch Size: The tool defaults to a batch size of 50. 如果要导入的文档很大,请考虑减小批大小。If the documents to be imported are large, consider lowering the batch size. 如果要导入的文档很小,请考虑增大批大小。Conversely, if the documents to be imported are small, consider raising the batch size.
  2. 最大脚本大小(字节):工具默认设置为 512 KB 的最大脚本大小。Max Script Size (bytes): The tool defaults to a max script size of 512 KB.
  3. 禁用自动生成 ID:如果要导入的每个文档都包含一个 ID 字段,则选择此选项可以提高性能。Disable Automatic Id Generation: If every document to be imported has an ID field, then selecting this option can increase performance. 不会导入缺少唯一 ID 字段的文档。Documents missing a unique ID field aren't imported.
  4. 更新现有文档:工具将默认设置为不替换存在 ID 冲突的现有文档。Update Existing Documents: The tool defaults to not replacing existing documents with ID conflicts. 选择此选项可以覆盖具有匹配 ID 的现有文档。Selecting this option allows overwriting existing documents with matching IDs. 此功能可用于更新现有文档的计划内数据迁移。This feature is useful for scheduled data migrations that update existing documents.
  5. 失败重试次数:指定在发生暂时性故障(例如网络连接中断)期间重试 Azure Cosmos DB 连接的频率。Number of Retries on Failure: Specifies how often to retry the connection to Azure Cosmos DB during transient failures (for example, network connectivity interruption).
  6. 重试间隔:指定在发生暂时性故障(例如网络连接中断)时重试 Azure Cosmos DB 连接等待的时间间隔。Retry Interval: Specifies how long to wait between retrying the connection to Azure Cosmos DB in case of transient failures (for example, network connectivity interruption).
  7. 连接模式:指定要用于 Azure Cosmos DB 的连接模式。Connection Mode: Specifies the connection mode to use with Azure Cosmos DB. 可用选项包括 DirectTcp、DirectHttps 和网关。The available choices are DirectTcp, DirectHttps, and Gateway. 直接连接模式速度更快,而网关模式对于防火墙更加友好,因为它仅使用端口 443。The direct connection modes are faster, while the gateway mode is more firewall friendly as it only uses port 443.

Azure Cosmos DB 批量导入高级选项的屏幕截图

Tip

导入工具默认设置为 DirectTcp 连接模式。The import tool defaults to connection mode DirectTcp. 如果遇到防火墙问题,请切换到网关连接模式,因为它只需要端口 443。If you experience firewall issues, switch to connection mode Gateway, as it only requires port 443.

导入到 SQL API(顺序记录导入)Import to the SQL API (Sequential Record Import)

借助 Azure Cosmos DB 顺序记录导入程序,可以从任何可用的源选项中逐条导入记录。The Azure Cosmos DB sequential record importer allows you to import from an available source option on a record-by-record basis. 如果要导入到已达到存储过程配额的现有集合中,可以选择此选项。You might choose this option if you're importing to an existing collection that has reached its quota of stored procedures. 该工具支持导入到单个(单分区和多分区)Azure Cosmos DB 集合。The tool supports import to a single (both single-partition and multi-partition) Azure Cosmos DB collection. 它还支持分片导入,通过这种方法可跨多个单分区或多分区 Azure Cosmos DB 集合对数据进行分区。It also supports sharded import whereby data is partitioned across more than one single-partition or multi-partition Azure Cosmos DB collection. 有关将数据分区的详细信息,请参阅 Azure Cosmos DB 中的分区和扩展For more information about partitioning data, see Partitioning and scaling in Azure Cosmos DB.

Azure Cosmos DB 顺序记录导入选项的屏幕截图

Azure Cosmos DB 连接字符串的格式为:The format of the Azure Cosmos DB connection string is:

AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;

可根据如何管理 Azure Cosmos DB 帐户中所述,从 Azure 门户的“密钥”页检索 Azure Cosmos DB 帐户的连接字符串。You can retrieve the connection string for the Azure Cosmos DB account from the Keys page of the Azure portal, as described in How to manage an Azure Cosmos DB account. 但是,需将数据库的名称追加到采用以下格式的连接字符串后面:However, the name of the database needs to be appended to the connection string in the following format:

Database=<Azure Cosmos DB Database>;

Note

使用“验证”命令来确保可以访问在连接字符串字段中指定的 Azure Cosmos DB 实例。Use the Verify command to ensure that the Azure Cosmos DB instance specified in the connection string field can be accessed.

若要导入到单个集合,请输入数据要导入到的集合的名称,然后单击“添加”按钮。To import to a single collection, enter the name of the collection to import data into, and then click the Add button. 若要导入到多个集合,请分别输入每个集合名称。To import to more than one collection, enter each collection name individually. 也可以使用以下语法指定多个集合:collection_prefix[开始索引 - 结束索引]。You may also use the following syntax to specify more than one collection: collection_prefix[start index - end index]. 通过前述语法指定多个集合时,请注意以下指导原则:When specifying more than one collection via the aforementioned syntax, keep the following guidelines in mind:

  1. 仅支持整数范围名称模式。Only integer range name patterns are supported. 例如,指定 collection[0-3] 会创建以下集合:collection0、collection1、collection2 和 collection3。For example, specifying collection[0-3] creates the following collections: collection0, collection1, collection2, collection3.
  2. 可以使用缩写的语法:collection[3] 创建步骤 1 中所述的同一组集合。You can use an abbreviated syntax: collection[3] creates the same set of collections mentioned in step 1.
  3. 可以提供多个替代。More than one substitution can be provided. 例如,collection[0-1] [0-9] 创建 20 个带前导零的集合名称(collection01、...02、...03)。For example, collection[0-1] [0-9] creates 20 collection names with leading zeros (collection01, ..02, ..03).

指定集合名称后,选择集合所需的吞吐量(400 RU 到 250,000 RU)。Once the collection name(s) have been specified, choose the desired throughput of the collection(s) (400 RUs to 250,000 RUs). 为了获得最佳导入性能,请选择更高的吞吐量。For best import performance, choose a higher throughput. 有关性能级别的详细信息,请参阅 Azure Cosmos DB 中的性能级别For more information about performance levels, see Performance levels in Azure Cosmos DB. 任何导入到吞吐量大于 10,000 RU 的集合的内容都需要分区键。Any import to collections with throughput >10,000 RUs require a partition key. 如果选择拥有超过 250,000 个 RU,则需要在门户中提交一个请求来增加帐户。If you choose to have more than 250,000 RUs, you need to file a request in the portal to have your account increased.

Note

吞吐量设置仅适用于集合或数据库创建。The throughput setting only applies to collection or database creation. 如果指定的集合已存在,则不会修改其吞吐量。If the specified collection already exists, its throughput won't be modified.

导入到多个集合时,导入工具支持基于哈希的分片。When importing to more than one collection, the import tool supports hash-based sharding. 在此方案中,请指定要用作分区键的文档属性。In this scenario, specify the document property you wish to use as the Partition Key. (如果分区键留空,文档将跨多个目标集合随机分片。)(If Partition Key is left blank, documents are sharded randomly across the target collections.)

在导入过程中,可以选择指定要将导入源中的哪个字段用作 Azure Cosmos DB 文档 ID 属性。You may optionally specify which field in the import source should be used as the Azure Cosmos DB document ID property during the import. (如果文档不包含此属性,导入工具会生成 GUID 作为 ID 属性值。)(If documents don't have this property, then the import tool generates a GUID as the ID property value.)

导入过程中可以使用多个高级选项。There are a number of advanced options available during import. 首先,在导入日期类型时(例如从 SQL Server 或 MongoDB 导入),可以选择三个导入选项之一:First, when importing date types (for example, from SQL Server or MongoDB), you can choose between three import options:

Azure Cosmos DB 日期时间导入选项的屏幕截图

  • 字符串:保持字符串值String: Persist as a string value
  • Epoch:保持 Epoch 数字值Epoch: Persist as an Epoch number value
  • 两者:保持字符串和 Epoch 数字值。Both: Persist both string and Epoch number values. 此选项创建一个子文档,例如:"date_joined": { "Value":"2013-10-21T21:17:25.2410000Z", "Epoch":1382390245 }This option creates a subdocument, for example: "date_joined": { "Value": "2013-10-21T21:17:25.2410000Z", "Epoch": 1382390245 }

Azure Cosmos DB - 顺序记录导入程序具有下列高级附加选项:The Azure Cosmos DB - Sequential record importer has the following additional advanced options:

  1. 并行请求数:工具默认设置为两个并行请求。Number of Parallel Requests: The tool defaults to two parallel requests. 如果要导入的文档很小,请考虑增加并行请求的数量。If the documents to be imported are small, consider raising the number of parallel requests. 如果此数字提高得过多,则导入可能会遇到速率限制。If this number is raised too much, the import may experience rate limiting.
  2. 禁用自动生成 ID:如果要导入的每个文档都包含一个 ID 字段,则选择此选项可以提高性能。Disable Automatic Id Generation: If every document to be imported has an ID field, then selecting this option can increase performance. 不会导入缺少唯一 ID 字段的文档。Documents missing a unique ID field aren't imported.
  3. 更新现有文档:工具将默认设置为不替换存在 ID 冲突的现有文档。Update Existing Documents: The tool defaults to not replacing existing documents with ID conflicts. 选择此选项可以覆盖具有匹配 ID 的现有文档。Selecting this option allows overwriting existing documents with matching IDs. 此功能可用于更新现有文档的计划内数据迁移。This feature is useful for scheduled data migrations that update existing documents.
  4. 失败重试次数:指定在发生暂时性故障(例如网络连接中断)期间重试 Azure Cosmos DB 连接的频率。Number of Retries on Failure: Specifies how often to retry the connection to Azure Cosmos DB during transient failures (for example, network connectivity interruption).
  5. 重试间隔:指定在发生暂时性故障(例如网络连接中断)期间重试 Azure Cosmos DB 连接等待的时间间隔。Retry Interval: Specifies how long to wait between retrying the connection to Azure Cosmos DB during transient failures (for example, network connectivity interruption).
  6. 连接模式:指定要用于 Azure Cosmos DB 的连接模式。Connection Mode: Specifies the connection mode to use with Azure Cosmos DB. 可用选项包括 DirectTcp、DirectHttps 和网关。The available choices are DirectTcp, DirectHttps, and Gateway. 直接连接模式速度更快,而网关模式对于防火墙更加友好,因为它仅使用端口 443。The direct connection modes are faster, while the gateway mode is more firewall friendly as it only uses port 443.

Azure Cosmos DB 顺序记录导入高级选项的屏幕截图

Tip

导入工具默认设置为 DirectTcp 连接模式。The import tool defaults to connection mode DirectTcp. 如果遇到防火墙问题,请切换到网关连接模式,因为它只需要端口 443。If you experience firewall issues, switch to connection mode Gateway, as it only requires port 443.

指定索引策略Specify an indexing policy

允许迁移工具在导入过程中创建 Azure Cosmos DB SQL API 集合时,可以指定集合的索引策略。When you allow the migration tool to create Azure Cosmos DB SQL API collections during import, you can specify the indexing policy of the collections. 在 Azure Cosmos DB 批量导入和 Azure Cosmos DB 顺序记录选项的高级选项部分,导航到“索引策略”部分。In the advanced options section of the Azure Cosmos DB Bulk import and Azure Cosmos DB Sequential record options, navigate to the Indexing Policy section.

Azure Cosmos DB 索引策略高级选项的屏幕截图

使用索引策略高级选项,可以选择一个索引策略文件,手动输入索引策略,或从一组默认模板中选择(右键单击索引策略文本框)。Using the Indexing Policy advanced option, you can select an indexing policy file, manually enter an indexing policy, or select from a set of default templates (by right-clicking in the indexing policy textbox).

工具提供的策略模板包括︰The policy templates the tool provides are:

  • 默认。Default. 针对字符串执行等式查询时,此策略最佳。This policy is best when you perform equality queries against strings. 如果针对数值使用 ORDER BY、范围和等式查询,此策略也适用。It also works if you use ORDER BY, range, and equality queries for numbers. 与范围模板相比,此策略的索引存储开销较低。This policy has a lower index storage overhead than Range.
  • 范围。Range. 此策略最适合对数字和字符串同时使用 ORDER BY、范围和等式查询的情况。This policy is best when you use ORDER BY, range, and equality queries on both numbers and strings. 与默认或哈希模板相比,此策略的索引存储开销较高。This policy has a higher index storage overhead than Default or Hash.

Azure Cosmos DB 索引策略高级选项的屏幕截图

Note

如果未指定索引策略,将应用默认策略。If you don't specify an indexing policy, then the default policy is applied. 有关索引策略的详细信息,请参阅 Azure Cosmos DB 索引策略For more information about indexing policies, see Azure Cosmos DB indexing policies.

导出到 JSON 文件Export to JSON file

使用 Azure Cosmos DB JSON 导出程序,可以将任何可用的源选项导出到包含一组 JSON 文档的 JSON 文件。The Azure Cosmos DB JSON exporter allows you to export any of the available source options to a JSON file that has an array of JSON documents. 该工具可自行处理导出。The tool handles the export for you. 或者,你也可以选择查看生成的迁移命令并自己运行该命令。Alternatively, you can choose to view the resulting migration command and run the command yourself. 生成的 JSON 文件可能存储在本地或 Azure Blob 存储中。The resulting JSON file may be stored locally or in Azure Blob storage.

Azure Cosmos DB JSON 本地文件导出选项的屏幕截图

Azure Cosmos DB JSON Azure Blob 存储导出选项的屏幕截图

可根据需要选择美化生成的 JSON。You may optionally choose to prettify the resulting JSON. 此操作会增加生成文档的大小,同时提高内容的易读性。This action will increase the size of the resulting document while making the contents more human readable.

  • 标准 JSON 导出Standard JSON export

    [{"id":"Sample","Title":"About Paris","Language":{"Name":"English"},"Author":{"Name":"Don","Location":{"City":"Paris","Country":"France"}},"Content":"Don's document in Azure Cosmos DB is a valid JSON document as defined by the JSON spec.","PageViews":10000,"Topics":[{"Title":"History of Paris"},{"Title":"Places to see in Paris"}]}]
    
  • 美化的 JSON 导出Prettified JSON export

      [
       {
      "id": "Sample",
      "Title": "About Paris",
      "Language": {
        "Name": "English"
      },
      "Author": {
        "Name": "Don",
        "Location": {
          "City": "Paris",
          "Country": "France"
        }
      },
      "Content": "Don's document in Azure Cosmos DB is a valid JSON document as defined by the JSON spec.",
      "PageViews": 10000,
      "Topics": [
        {
          "Title": "History of Paris"
        },
        {
          "Title": "Places to see in Paris"
        }
      ]
      }]
    

下面是用于将 JSON 文件导出到 Azure Blob 存储的命令行示例:Here is a command-line sample to export the JSON file to Azure Blob storage:

dt.exe /ErrorDetails:All /s:DocumentDB /s.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB database_name>" /s.Collection:<CosmosDB collection_name>
/t:JsonFile /t.File:"blobs://<Storage account key>@<Storage account name>.blob.core.chinacloudapi.cn:443/<Container_name>/<Blob_name>"
/t.Overwrite

高级配置Advanced configuration

在高级配置屏幕中,指定要向其中写入错误的日志文件的位置。In the Advanced configuration screen, specify the location of the log file to which you would like any errors written. 本页适用的规则如下:The following rules apply to this page:

  1. 如果未提供文件名,则会在结果页上返回所有错误。If a file name isn't provided, then all errors are returned on the Results page.

  2. 如果提供的文件名中不包含目录,则会在当前环境目录中创建(或覆盖)文件。If a file name is provided without a directory, then the file is created (or overwritten) in the current environment directory.

  3. 如果选择一个现有文件,该文件将被覆盖,且不提供追加选项。If you select an existing file, then the file is overwritten, there's no append option.

  4. 然后,选择是记录所有、关键还是无错误消息。Then, choose whether to log all, critical, or no error messages. 最后,确定更新屏幕上的传输消息及其进度的频率。Finally, decide how frequently the on-screen transfer message is updated with its progress.

    “高级配置”屏幕的屏幕截图

确认导入设置并查看命令行Confirm import settings and view command line

  1. 在指定源信息、目标信息和高级配置后,查看迁移摘要,并根据需要查看或复制生成的迁移命令。After you specify the source information, target information, and advanced configuration, review the migration summary and view or copy the resulting migration command if you want. (复制命令对于自动执行导入操作非常有用。)(Copying the command is useful to automate import operations.)

    摘要屏幕的屏幕截图

    摘要屏幕的屏幕截图

  2. 对源和目标选项满意后,单击“导入” 。Once you're satisfied with your source and target options, click Import. 在导入过程中,已用时间、传输计数和失败信息(如果未在“高级”配置中提供文件名)将会更新。The elapsed time, transferred count, and failure information (if you didn't provide a file name in the Advanced configuration) update as the import is in process. 完成后,可以导出结果(例如,用于处理所有导入失败结果)。Once complete, you can export the results (for example, to deal with any import failures).

    Azure Cosmos DB JSON 导出选项的屏幕截图

  3. 还可以通过重置所有值或保留现有设置来启动新导入。You may also start a new import by either resetting all values or keeping the existing settings. (例如,可以选择保留连接字符串信息、源和目标选项等。)(For example, you may choose to keep connection string information, source and target choice, and more.)

    Azure Cosmos DB JSON 导出选项的屏幕截图

后续步骤Next steps

在本教程中,我们已完成以下任务:In this tutorial, you've done the following tasks:

  • 已安装数据迁移工具Installed the Data Migration tool
  • 已从不同的数据源导入数据Imported data from different data sources
  • 已从 Azure Cosmos DB 导出到 JSONExported from Azure Cosmos DB to JSON

现在可以继续学习下一教程,了解如何使用 Azure Cosmos DB 查询数据。You can now proceed to the next tutorial and learn how to query data using Azure Cosmos DB.