具有中高速网络带宽的大型数据集的数据传输Data transfer for large datasets with moderate to high network bandwidth

本文概述了在环境中有中高速网络带宽并且正在计划传输大型数据集时的数据传输解决方案。This article provides an overview of the data transfer solutions when you have moderate to high network bandwidth in your environment and you are planning to transfer large datasets. 本文还介绍了针对此情况的推荐数据传输选项和相应的关键功能矩阵。The article also describes the recommended data transfer options and the respective key capability matrix for this scenario.

若要查看所有可用数据传输选项的概述,请转到选择一个 Azure 数据传输解决方案To understand an overview of all the available data transfer options, go to Choose an Azure data transfer solution.

方案描述Scenario description

大型数据集指的是 TB 到 PB 级的数据大小。Large datasets refer to data sizes in the order of TBs to PBs. 中高速网络带宽是指 100 Mbps 到 10 Gbps 的网络带宽。Moderate to high network bandwidth refers to 100 Mbps to 10 Gbps.

此方案中推荐的选项取决于是否具有中速网络带宽或高速网络带宽。The options recommended in this scenario depend on whether you have moderate network bandwidth or high network bandwidth.

中速网络带宽 (100 Mbps - 1 Gbps)Moderate network bandwidth (100 Mbps - 1 Gbps)

使用中速网络带宽,需要预测通过该网络传输数据的时间。With moderate network bandwidth, you need to project the time for data transfer over the network.

使用下表估计时间,并根据此时间,在离线传输或脱机传输之间进行选择。Use the following table to estimate the time and based on that, choose between an offline transfer or over the network transfer. 下表显示各种可用网络带宽(假设利用率为 90%)的网络数据传输的预测时间。The table shows the projected time for network data transfer, for various available network bandwidths (assuming 90% utilization).

网络传输或脱机传输

  • 如果预测网络传输速度很慢,应使用物理设备。If the network transfer is projected to be too slow, you should use a physical device. 在这种情况下,推荐的选项是 Azure Data Box 系列的离线传输设备或使用自己的磁盘执行 Azure 导入/导出。The recommended options in this case are the offline transfer devices from Azure Data Box family or Azure Import/Export using your own disks.

    • 用于脱机传输的 Azure Data Box 系列 – 当受到时间、网络可用性或成本的限制时,使用 Azure 提供的 Data Box 设备将大量数据移到 Azure。Azure Data Box family for offline transfers – Use devices from Azure-supplied Data Box devices to move large amounts of data to Azure when you're limited by time, network availability, or costs. 使用工具(例如 Robocopy)复制本地数据。Copy on-premises data using tools such as Robocopy. 根据要传输的数据的大小,你可以选择 Data Box Disk。Depending on the data size intended for transfer, you can choose Data Box Disk.
    • Azure 导入/导出 - 通过寄送自己的磁盘驱动器,使用 Azure 导入/导出服务安全地将大量数据导入 Azure Blob 存储和 Azure 文件。Azure Import/Export – Use Azure Import/Export service by shipping your own disk drives to securely import large amounts of data to Azure Blob storage and Azure Files. 此外,还可以使用此服务将数据从 Azure Blob 存储传输到磁盘驱动器,然后再寄送到本地站点。This service can also be used to transfer data from Azure Blob storage to disk drives and ship to your on-premises sites.
  • 如果预测出网络传输比较合理,那么可以使用以下在高速网络带宽中详细介绍的工具。If the network transfer is projected to be reasonable, then you can use any of the following tools detailed in High network bandwidth.

高速网络带宽 (1 Gbps - 100 Gbps)High network bandwidth (1 Gbps - 100 Gbps)

如果可用的网络带宽为高速带宽,则使用下面的一种工具。If the available network bandwidth is high, use one of the following tools.

  • AzCopy**** - 使用此命令行工具在保证最佳性能的同时轻松向/从 Azure Blob、文件和表存储复制数据。AzCopy - Use this command-line tool to easily copy data to and from Azure Blobs, Files, and Table storage with optimal performance. AzCopy 支持并发度和并行度,并且可以在复制操作中断后进行恢复。AzCopy supports concurrency and parallelism, and the ability to resume copy operations when interrupted.
  • Azure 存储 REST API/SDK**** - 生成应用程序时,可以对照着 Azure 存储 REST API 开发应用程序,并使用以多种语言提供的 Azure SDK。Azure Storage REST APIs/SDKs – When building an application, you can develop the application against Azure Storage REST APIs and use the Azure SDKs offered in multiple languages.
  • Azure 数据工厂 - 如果需要业务流程和企业级监视功能,应使用数据工厂横向扩展传输操作。Azure Data Factory – Data Factory should be used to scale out a transfer operation, and if there is a need for orchestration and enterprise grade monitoring capabilities. 使用数据工厂在多个 Azure 服务、本地或两者的组合之间定期传输文件。Use Data Factory to regularly transfer files between several Azure services, on-premises, or a combination of the two. 使用数据工厂,可以创建和计划数据驱动型工作流(称为管道),以便从不同的数据存储引入数据并自动执行数据移动和数据传输。with Data Factory, you can create and schedule data-driven workflows (called pipelines) that ingest data from disparate data stores and automate data movement and data transformation.

关键功能比较Comparison of key capabilities

下表总结了推荐选项的主要功能差异。The following tables summarize the differences in key capabilities for the recommended options.

中速网络带宽Moderate network bandwidth

如果使用脱机数据传输,请通过下表了解主要功能之间的差异。If using offline data transfer, use the following table to understand the differences in key capabilities.

Data Box DiskData Box Disk 导入/导出Import/Export
数据大小Data size 最多为 35 TBUp to 35 TBs 变量Variable
Data typeData type Azure BlobAzure Blobs Azure BlobAzure Blobs
Azure 文件Azure Files
外形规格Form factor 每笔订单 5 个 SSD5 SSDs per order 每笔订单最多 10 个 HDD/SSDUp to 10 HDDs/SSDs per order
初始设置时间Initial setup time Low
(15 分钟)(15 mins)
中等到困难Moderate to difficult
(不定)(variable)
将数据发送到 AzureSend data to Azure Yes Yes
从 Azure 导出数据Export data from Azure No Yes
加密Encryption AES 128 位AES 128-bit AES 128 位AES 128-bit
硬件Hardware Azure 提供Azure supplied 客户提供Customer supplied
网络接口Network interface USB 3.1/SATAUSB 3.1/SATA SATA II/SATA IIISATA II/SATA III
合作伙伴集成Partner integration 一些Some 一些Some
寄送Shipping Azure 托管Azure managed 由客户管理Customer managed
数据移动时使用Use when data moves 在商务区域内Within a commerce boundary 跨地理区域,例如美国到欧洲Across geographic boundaries, e.g. US to EU
定价Pricing 定价Pricing 定价Pricing

如果使用在线数据传输,请使用以下部分中的表格获得高速网络带宽。If using online data transfer, use the table in the following section for high network bandwidth.

高速网络带宽High network bandwidth

Tools AzCopy,Tools AzCopy,
Azure PowerShell,Azure PowerShell,
Azure CLIAzure CLI
Azure 存储 REST API,SDKAzure Storage REST APIs, SDKs Azure 数据工厂Azure Data Factory
Data typeData type Azure Blob、Azure 文件、Azure 表Azure Blobs, Azure Files, Azure Tables Azure Blob、Azure 文件、Azure 表Azure Blobs, Azure Files, Azure Tables 支持 70 多个用于数据存储和格式的数据连接器Supports 70+ data connectors for data stores and formats
外形规格Form factor 命令行工具Command-line tools 编程接口Programmatic interface Azure 门户中的服务Service in Azure portal
初始一次性设置Initial one-time setup 简单Easy 中等Moderate 广泛Extensive
数据预处理Data pre-processing No No Yes
从其他云传输Transfer from other clouds No No Yes
用户类型User type IT 专家或开发人员IT Pro or dev DevDev IT 专业人员IT Pro
定价Pricing 免费,收取数据出口费用Free, data egress charges apply 免费,收取数据出口费用Free, data egress charges apply 定价Pricing

后续步骤Next steps