使用 Azure 数据工厂将数据从 Data Lake 或数据仓库迁移到 AzureUse Azure Data Factory to migrate data from your data lake or data warehouse to Azure

适用于:是 Azure 数据工厂否 Azure Synapse Analytics(预览版)APPLIES TO: yesAzure Data Factory noAzure Synapse Analytics (Preview)

若要将 Data Lake 或企业数据仓库 (EDW) 迁移到 Azure,请考虑使用 Azure 数据工厂。If you want to migrate your data lake or enterprise data warehouse (EDW) to Azure, consider using Azure Data Factory. Azure 数据工厂非常适合以下场景:Azure Data Factory is well-suited to the following scenarios:

  • 将大数据工作负荷从 Amazon 简单存储服务 (Amazon S3) 或本地 Hadoop 分布式文件系统 (HDFS) 迁移到 AzureBig data workload migration from Amazon Simple Storage Service (Amazon S3) or an on-premises Hadoop Distributed File System (HDFS) to Azure
  • 将 EDW 从 Oracle Exadata、Netezza、Teradata 或 Amazon Redshift 迁移到 AzureEDW migration from Oracle Exadata, Netezza, Teradata, or Amazon Redshift to Azure

在迁移 Data Lake 时,Azure 数据工厂可以移动千万亿字节 (PB) 量级的数据;在迁移数据仓库时,它可以移动万亿字节 (TB) 量级的数据。Azure Data Factory can move petabytes (PB) of data for data lake migration, and tens of terabytes (TB) of data for data warehouse migration.

为何可以使用 Azure 数据工厂来迁移数据Why Azure Data Factory can be used for data migration

  • Azure 数据工厂可以轻松增大处理能力,能以无服务器方式移动数据,并保持较高的性能、复原能力和可伸缩性。Azure Data Factory can easily scale up the amount of processing power to move data in a serverless manner with high performance, resilience, and scalability. 你只需为使用的资源付费。And you pay only for what you use. 另请注意以下几点:Also note the following:
    • Azure 数据工厂对数据量或文件数没有限制。Azure Data Factory has no limitations on data volume or on the number of files.
    • Azure 数据工厂可以充分利用网络和存储带宽,在环境中实现最大的数据移动吞吐量。Azure Data Factory can fully use your network and storage bandwidth to achieve the highest volume of data movement throughput in your environment.
    • Azure 数据工厂采用预付方式,因此,你只需根据将数据迁移到 Azure 实际花费的时间付费。Azure Data Factory uses a pay-in-advance method, so that you pay only for the time you actually use to run the data migration to Azure.
  • Azure 数据工厂可以执行一次性的历史数据加载和计划的增量加载。Azure Data Factory can perform both a one-time historical load and scheduled incremental loads.
  • Azure 数据工厂使用 Azure 集成运行时 (IR) 在可公开访问的 Data Lake 与仓库终结点之间移动数据。Azure Data Factory uses Azure integration runtime (IR) to move data between publicly accessible data lake and warehouse endpoints. 它还可以使用自承载 IR 来移动位于 Azure 虚拟网络 (VNet) 内部或防火墙后面的 Data Lake 和仓库终结点的数据。It can also use self-hosted IR for moving data for data lake and warehouse endpoints inside Azure Virtual Network (VNet) or behind a firewall.
  • Azure 数据工厂具有企业级安全性:可以使用 Windows Installer (MSI) 或服务标识进行安全的服务到服务集成,或使用 Azure Key Vault 进行凭据管理。Azure Data Factory has enterprise-grade security: You can use Windows Installer (MSI) or Service Identity for secured service-to-service integration, or use Azure Key Vault for credential management.
  • Azure 数据工厂提供无代码创作体验,以及丰富的内置监视仪表板。Azure Data Factory provides a code-free authoring experience and a rich, built-in monitoring dashboard.

联机与脱机数据迁移Online vs. offline data migration

Azure 数据工厂是用于通过网络(Internet、ER 或 VPN)传输数据的标准联机数据迁移工具。Azure Data Factory is a standard online data migration tool to transfer data over a network (internet, ER, or VPN). 而使用脱机数据迁移时,用户需要将数据传输设备的实物从其组织寄送到 Azure 数据中心。Whereas with offline data migration, users physically ship data-transfer devices from their organization to an Azure Data Center.

在联机与脱机迁移方法之间选择时,请注意三个要点:There are three key considerations when you choose between an online and offline migration approach:

  • 要迁移的数据大小Size of data to be migrated
  • 网络带宽Network bandwidth
  • 迁移时限Migration window

例如,假设你打算使用 Azure 数据工厂在两周(迁移时限)内完成数据迁移。 For example, assume you plan to use Azure Data Factory to complete your data migration within two weeks (your migration window). 请注意下表中的粉红色/蓝色切割线。Notice the pink/blue cut line in the following table. 在任意给定列中,最下面的粉红色单元格显示其迁移时限最接近(但不到)两周的数据大小/网络带宽对。The lowest pink cell for any given column shows the data size/network bandwidth pairing whose migration window is closest to but less than two weeks. (蓝色单元格中的任何大小/带宽对的联机迁移时限超过两周。)(Any size/bandwidth pairing in a blue cell has an online migration window of more than two weeks.)

联机与脱机迁移此表可帮助你根据数据大小和可用网络带宽,确定是否可以通过联机迁移(Azure 数据工厂)来满足预期的迁移时限。online vs. offline This table helps you determine whether you can meet your intended migration window through online migration (Azure Data Factory) based on the size of your data and your available network bandwidth. 如果联机迁移时限超过两周,则需要使用脱机迁移。If the online migration window is more than two weeks, you'll want to use offline migration.

备注

使用联机迁移,可以通过一个工具端到端地实现历史数据加载和增量馈送。By using online migration, you can achieve both historical data loading and incremental feeds end-to-end through a single tool. 通过此方法,数据在整个迁移时限内可在现有存储与新存储之间保持同步。Through this approach, your data can be kept synchronized between the existing store and the new store during the entire migration window. 这意味着,可以使用刷新的数据在新存储中重新生成 ETL 逻辑。This means you can rebuild your ETL logic on the new store with refreshed data.

后续步骤Next steps