Data transfer for large datasets with moderate to high network bandwidth
This article provides an overview of the data transfer solutions when you have moderate to high network bandwidth in your environment and you are planning to transfer large datasets. The article also describes the recommended data transfer options and the respective key capability matrix for this scenario.
To understand an overview of all the available data transfer options, go to Choose an Azure data transfer solution.
Scenario description
Large datasets refer to data sizes in the order of TBs to PBs. Moderate to high network bandwidth refers to 100 Mbps to 10 Gbps.
Recommended options
The options recommended in this scenario depend on whether you have moderate network bandwidth or high network bandwidth.
Moderate network bandwidth (100 Mbps - 1 Gbps)
With moderate network bandwidth, you need to project the time for data transfer over the network.
Use the following table to estimate the time and based on that, choose between an offline transfer or over the network transfer. The table shows the projected time for network data transfer, for various available network bandwidths (assuming 90% utilization).
If the network transfer is projected to be too slow, you should use a physical device. The recommended options in this case are the offline transfer devices from Azure Data Box family or Azure Import/Export using your own disks.
- Azure Data Box family for offline transfers – Use devices from Azure-supplied Data Box devices to move large amounts of data to Azure when you're limited by time, network availability, or costs. Copy on-premises data using tools such as Robocopy. Depending on the data size intended for transfer, you can choose Data Box Disk.
- Azure Import/Export – Use Azure Import/Export service by shipping your own disk drives to securely import large amounts of data to Azure Blob storage and Azure Files. This service can also be used to transfer data from Azure Blob storage to disk drives and ship to your on-premises sites.
If the network transfer is projected to be reasonable, then you can use any of the following tools detailed in High network bandwidth.
High network bandwidth (1 Gbps - 100 Gbps)
If the available network bandwidth is high, use one of the following tools.
- AzCopy - Use this command-line tool to easily copy data to and from Azure Blobs, Files, and Table storage with optimal performance. AzCopy supports concurrency and parallelism, and the ability to resume copy operations when interrupted.
- Azure Storage REST APIs/SDKs – When building an application, you can develop the application against Azure Storage REST APIs and use the Azure SDKs offered in multiple languages.
- Azure Data Factory – Data Factory should be used to scale out a transfer operation, and if there is a need for orchestration and enterprise grade monitoring capabilities. Use Data Factory to regularly transfer files between several Azure services, on-premises, or a combination of the two. with Data Factory, you can create and schedule data-driven workflows (called pipelines) that ingest data from disparate data stores and automate data movement and data transformation.
Comparison of key capabilities
The following tables summarize the differences in key capabilities for the recommended options.
Moderate network bandwidth
If using offline data transfer, use the following table to understand the differences in key capabilities.
Data Box Disk | Import/Export | |
---|---|---|
Data size | Up to 35 TBs | Variable |
Data type | Azure Blobs Azure Files* |
Azure Blobs Azure Files |
Form factor | 5 SSDs per order | Up to 10 HDDs/SSDs per order |
Initial setup time | Low (15 mins) |
Moderate to difficult (variable) |
Send data to Azure | Yes | Yes |
Export data from Azure | No | Yes |
Encryption | AES 128-bit | AES 128-bit |
Hardware | Microsoft supplied | Customer supplied |
Network interface | USB 3.1/SATA | SATA II/SATA III |
Partner integration | Some | Some |
Shipping | Microsoft managed | Customer managed |
Use when data moves | Within a commerce boundary | Across geographic boundaries |
* Data Box Disk does not support Large File Shares and does not preserve file metadata
If using online data transfer, use the table in the following section for high network bandwidth.
High network bandwidth
Tools AzCopy, Azure PowerShell, Azure CLI |
Azure Storage REST APIs, SDKs | Azure Data Factory | |
---|---|---|---|
Data type | Azure Blobs, Azure Files, Azure Tables | Azure Blobs, Azure Files, Azure Tables | Supports 70+ data connectors for data stores and formats |
Form factor | Command-line tools | Programmatic interface | Service in Azure portal |
Initial one-time setup | Easy | Moderate | Extensive |
Data pre-processing | No | No | Yes |
Transfer from other clouds | No | No | Yes |
User type | IT Pro or dev | Dev | IT Pro |
Pricing | Free, data egress charges apply | Free, data egress charges apply | Pricing |
Next steps
Understand how to:
Use the REST APIs to transfer data: