通过示例了解数据工厂定价Understanding Data Factory pricing through examples

本文使用详细的示例介绍并演示 Azure 数据工厂定价模型。This article explains and demonstrates the Azure Data Factory pricing model with detailed examples.

Note

以下这些示例中使用的价格是假设的,并不意味着实际定价。The prices used in these examples below are hypothetical and are not intended to imply actual pricing.

将数据每隔一小时从 AWS S3 复制到 Azure Blob 存储Copy data from AWS S3 to Azure Blob storage hourly

在此方案中,需按计划将数据每隔一小时从 AWS S3 复制到 Azure Blob 存储。In this scenario, you want to copy data from AWS S3 to Azure Blob storage on an hourly schedule.

若要完成此方案,需使用以下项创建一个管道:To accomplish the scenario, you need to create a pipeline with the following items:

  1. 使用输入数据集(适用于将要从 AWS S3 复制的数据)的复制活动。A copy activity with an input dataset for the data to be copied from AWS S3.

  2. Azure 存储上的数据的输出数据集。An output dataset for the data on Azure Storage.

  3. 一个计划触发器,用于每隔一小时执行一次管道。A schedule trigger to execute the pipeline every hour.

    方案 1

操作Operations 类型和单元Types and Units
创建链接的服务Create Linked Service 2 个读/写实体2 Read/Write entity
创建数据集Create Datasets 4 个读/写实体(2 个用于创建数据集,2 个用于链接的服务的引用)4 Read/Write entities (2 for dataset creation, 2 for linked service references)
创建管道Create Pipeline 3 个读/写实体(1 个用于创建管道,2 个用于数据集引用)3 Read/Write entities (1 for pipeline creation, 2 for dataset references)
获取管道Get Pipeline 1 个读/写实体1 Read/Write entity
运行管道Run Pipeline 2 个活动运行(1 个用于触发器运行,1 个用于活动运行)2 Activity runs (1 for trigger run, 1 for activity runs)
复制数据假设:执行时间 = 10 分钟Copy Data Assumption: execution time = 10 min 10 * 4 Azure Integration Runtime(默认 DIU 设置 = 4)有关数据集成单元和副本性能优化的详细信息,请参阅此文10 * 4 Azure Integration Runtime (default DIU setting = 4) For more information on data integration units and optimizing copy performance, see this article
监视管道假设:仅发生 1 次运行Monitor Pipeline Assumption: Only 1 run occurred 重试了 2 个监视运行记录(1 个用于管道运行,1 个用于活动运行)2 Monitoring run records retried (1 for pipeline run, 1 for activity run)

方案定价总计:¥0.44547136Total Scenario pricing: ¥0.44547136

  • 数据工厂操作 = ¥0.00111936Data Factory Operations = ¥0.00111936
    • 读取/写入 = 10*0.00010176 = ¥0.0010176 [1 次读取/写入 = ¥5.088/50000 = 0.00010176]Read/Write = 10*0.00010176 = ¥0.0010176 [1 R/W = ¥5.088/50000 = 0.00010176]
    • 监视 = 2*0.00005088 = ¥0.00010176 [1 次监视 = ¥2.544/50000 = 0.00005088]Monitoring = 2*0.00005088 = ¥0.00010176 [1 Monitoring = ¥2.544/50000 = 0.00005088]
  • 管道业务流程 & 执行 = ¥0.444352Pipeline Orchestration & Execution = ¥0.444352
    • 活动运行 = 0.010176*2 = ¥0.020352 [1 次运行 = ¥10.176/1000 = 0.010176]Activity Runs = 0.010176*2 = ¥0.020352 [1 run = ¥10.176/1000 = 0.010176]
    • 数据移动活动 = ¥0.424(以 10 分钟的执行时间按比例计算。Data Movement Activities = ¥0.424 (Prorated for 10 minutes of execution time. Azure Integration Runtime 上的定价为 ¥2.544/小时)¥2.544/hour on Azure Integration Runtime)

使用 Azure HDInsight 按小时复制数据并进行转换Copy data and transform with Azure HDInsight hourly

在此方案中,需使用 Azure HDInsight 按计划将数据每隔一小时从 AWS S3 复制到 Azure Blob 存储并对数据进行转换。In this scenario, you want to copy data from AWS S3 to Azure Blob storage and transform the data with Azure HDInsight on an hourly schedule.

若要完成此方案,需使用以下项创建一个管道:To accomplish the scenario, you need to create a pipeline with the following items:

  1. 一个使用输入数据集(适用于将要从 AWS S3 复制的数据)和输出数据集(适用于 Azure 存储上的数据)的复制活动。One copy activity with an input dataset for the data to be copied from AWS S3, and an output dataset for the data on Azure storage.
  2. 一个用于数据转换的 Azure HDInsight 活动。One Azure HDInsight activity for the data transformation.
  3. 一个计划触发器,用于每隔一小时执行一次管道。One schedule trigger to execute the pipeline every hour.

方案 2

操作Operations 类型和单元Types and Units
创建链接的服务Create Linked Service 3 个读/写实体3 Read/Write entity
创建数据集Create Datasets 4 个读/写实体(2 个用于创建数据集,2 个用于链接的服务的引用)4 Read/Write entities (2 for dataset creation, 2 for linked service references)
创建管道Create Pipeline 3 个读/写实体(1 个用于创建管道,2 个用于数据集引用)3 Read/Write entities (1 for pipeline creation, 2 for dataset references)
获取管道Get Pipeline 1 个读/写实体1 Read/Write entity
运行管道Run Pipeline 3 个活动运行(1 个用于触发器运行,2 个用于活动运行)3 Activity runs (1 for trigger run, 2 for activity runs)
复制数据假设:执行时间 = 10 分钟Copy Data Assumption: execution time = 10 min 10 * 4 Azure Integration Runtime(默认 DIU 设置 = 4)有关数据集成单元和副本性能优化的详细信息,请参阅此文10 * 4 Azure Integration Runtime (default DIU setting = 4) For more information on data integration units and optimizing copy performance, see this article
监视管道假设:仅发生 1 次运行Monitor Pipeline Assumption: Only 1 run occurred 重试了 3 个监视运行记录(1 个用于管道运行,2 个用于活动运行)3 Monitoring run records retried (1 for pipeline run, 2 for activity run)
执行 Azure HDInsight 假设:执行时间 = 10 分钟Execute Azure HDInsight activity Assumption: execution time = 10 min 10 分钟执行外部管道活动10 min External Pipeline Activity Execution

方案定价总计:$0.16916Total Scenario pricing: $0.16916

  • 数据工厂操作 = ¥0.001272Data Factory Operations = ¥0.001272
    • 读取/写入 = 11*0.00010176 = ¥0.00111936 [1 次读取/写入 = ¥5.088/50000 = 0.00010176]Read/Write = 11*0.00010176 = ¥0.00111936 [1 R/W = ¥5.088/50000 = 0.00010176]
    • 监视 = 3*0.00005088 = ¥0.00015264‬ [1 次监视 = ¥2.544/50000 = 0.00005088]Monitoring = 3*0.00005088 = ¥0.00015264‬ [1 Monitoring = ¥2.544/50000 = 0.00005088]
  • 管道业务流程 & 执行 = ¥0.45495133Pipeline Orchestration & Execution = ¥0.45495133
    • 活动运行 = 0.010176*3 = ¥0.030528 [1 次运行 = ¥10.176/1000 = 0.010176]Activity Runs = 0.010176*3 = ¥0.030528 [1 run = ¥10.176/1000 = 0.010176]
    • 数据移动活动 = ¥0.424(以 10 分钟的执行时间按比例计算。Data Movement Activities = ¥0.424 (Prorated for 10 minutes of execution time. Azure Integration Runtime 上的定价为 ¥2.544/小时)¥2.544/hour on Azure Integration Runtime)
    • 外部管道活动 = ¥0.00042333(以 10 分钟的执行时间按比例计算。External Pipeline Activity = ¥0.00042333 (Prorated for 10 minutes of execution time. Azure Integration Runtime 上的定价为 ¥0.00254/小时)¥0.00254/hour on Azure Integration Runtime)

使用动态参数按小时复制数据并进行转换Copy data and transform with dynamic parameters hourly

在此方案中,需使用 Azure HDInsight(使用脚本中的动态参数)按计划将数据每隔一小时从 AWS S3 复制到 Azure Blob 存储并进行转换。In this scenario, you want to copy data from AWS S3 to Azure Blob storage and transform with Azure HDInsight (with dynamic parameters in the script) on an hourly schedule.

若要完成此方案,需使用以下项创建一个管道:To accomplish the scenario, you need to create a pipeline with the following items:

  1. 一个使用输入数据集(适用于将要从 AWS S3 复制的数据)和输出数据集(适用于 Azure 存储上的数据)的复制活动。One copy activity with an input dataset for the data to be copied from AWS S3, an output dataset for the data on Azure storage.
  2. 一个查找活动,用于将参数动态传递到转换脚本。One Lookup activity for passing parameters dynamically to the transformation script.
  3. 一个用于数据转换的 Azure HDInsight 活动。One Azure HDInsight activity for the data transformation.
  4. 一个计划触发器,用于每隔一小时执行一次管道。One schedule trigger to execute the pipeline every hour.

方案 3

操作Operations 类型和单元Types and Units
创建链接的服务Create Linked Service 3 个读/写实体3 Read/Write entity
创建数据集Create Datasets 4 个读/写实体(2 个用于创建数据集,2 个用于链接的服务的引用)4 Read/Write entities (2 for dataset creation, 2 for linked service references)
创建管道Create Pipeline 3 个读/写实体(1 个用于创建管道,2 个用于数据集引用)3 Read/Write entities (1 for pipeline creation, 2 for dataset references)
获取管道Get Pipeline 1 个读/写实体1 Read/Write entity
运行管道Run Pipeline 4 个活动运行(1 个用于触发器运行,3 个用于活动运行)4 Activity runs (1 for trigger run, 3 for activity runs)
复制数据假设:执行时间 = 10 分钟Copy Data Assumption: execution time = 10 min 10 * 4 Azure Integration Runtime(默认 DIU 设置 = 4)有关数据集成单元和副本性能优化的详细信息,请参阅此文10 * 4 Azure Integration Runtime (default DIU setting = 4) For more information on data integration units and optimizing copy performance, see this article
监视管道假设:仅发生 1 次运行Monitor Pipeline Assumption: Only 1 run occurred 重试了 4 个监视运行记录(1 个用于管道运行,3 个用于活动运行)4 Monitoring run records retried (1 for pipeline run, 3 for activity run)
执行查找活动假设:执行时间 = 1 分钟Execute Lookup activity Assumption: execution time = 1 min 1 分钟执行管道活动1 min Pipeline Activity execution
执行 Azure HDInsight 假设:执行时间 = 10 分钟Execute Azure HDInsight activity Assumption: execution time = 10 min 10 分钟执行外部管道活动10 min External Pipeline Activity execution

方案定价总计:¥0.46729854Total Scenario pricing: ¥0.46729854

  • 数据工厂操作 = ¥0.00132288Data Factory Operations = ¥0.00132288
    • 读取/写入 = 11*0.00010176 = ¥0.00111936 [1 次读取/写入 = ¥5.088/50000 = 0.00010176]Read/Write = 11*0.00010176 = ¥0.00111936 [1 R/W = ¥5.088/50000 = 0.00010176]
    • 监视 = 4*0.00005088 = ¥0.00020352 [1 次监视 = ¥2.544/50000 = 0.00005088]Monitoring = 4*0.00005088 = ¥0.00020352 [1 Monitoring = ¥2.544/50000 = 0.00005088]
  • 管道业务流程 & 执行 = ¥0.46597566Pipeline Orchestration & Execution = ¥0.46597566
    • 活动运行 = 0.010176*4 = ¥0.040704 [1 次运行 = ¥10.176/1000 = 0.010176]Activity Runs = 0.010176*4 = ¥0.040704 [1 run = ¥10.176/1000 = 0.010176]
    • 数据移动活动 = ¥0.424(以 10 分钟的执行时间按比例计算。Data Movement Activities = ¥0.424 (Prorated for 10 minutes of execution time. Azure Integration Runtime 上的定价为 ¥2.544/小时)¥2.544/hour on Azure Integration Runtime)
    • 管道活动 = ¥0.00084833(以 1 分钟的执行时间按比例计算。Pipeline Activity = ¥0.00084833 (Prorated for 1 minute of execution time. Azure Integration Runtime 上的定价为 ¥0.0509/小时)¥0.0509/hour on Azure Integration Runtime)
    • 外部管道活动 = ¥0.00042333(以 10 分钟的执行时间按比例计算。External Pipeline Activity = ¥0.00042333 (Prorated for 10 minutes of execution time. Azure Integration Runtime 上的定价为 ¥0.00254/小时)¥0.00254/hour on Azure Integration Runtime)

后续步骤Next steps

了解 Azure 数据工厂的定价以后,即可开始操作!Now that you understand the pricing for Azure Data Factory, you can get started!