Azure 数据工厂中的映射数据流Mapping data flows in Azure Data Factory

适用于: Azure 数据工厂 Azure Synapse Analytics

映射数据流是什么?What are mapping data flows?

映射数据流是 Azure 数据工厂中以可视方式设计的数据转换。Mapping data flows are visually designed data transformations in Azure Data Factory. 使用数据流,数据工程师可以开发数据转换逻辑,无需编写代码。Data flows allow data engineers to develop data transformation logic without writing code. 生成的数据流是使用横向扩展的 Apache Spark 群集作为活动在 Azure 数据工厂管道内执行的。The resulting data flows are executed as activities within Azure Data Factory pipelines that use scaled-out Apache Spark clusters. 可以通过现有的 Azure 数据工厂计划、控制、流和监视功能,将数据流活动操作化。Data flow activities can be operationalized using existing Azure Data Factory scheduling, control, flow, and monitoring capabilities.

映射数据流提供完全直观的体验,无需编码。Mapping data flows provide an entirely visual experience with no coding required. 数据流在 ADF 托管的执行群集上运行,以进行横向扩展的数据处理。Your data flows run on ADF-managed execution clusters for scaled-out data processing. Azure 数据工厂处理数据流作业的所有代码转换、路径优化以及执行。Azure Data Factory handles all the code translation, path optimization, and execution of your data flow jobs.

入门Getting started

数据流是从“工厂资源”窗格(如管道和数据集)中创建的。Data flows are created from the factory resources pane like pipelines and datasets. 若要创建数据流,请选择“工厂资源”旁边的加号,然后选择“数据流” 。To create a data flow, select the plus sign next to Factory Resources, and then select Data Flow.

新建数据流

此操作将转到数据流画布,你可在其中创建转换逻辑。This action takes you to the data flow canvas, where you can create your transformation logic. 选择“添加源”,开始配置源转换。Select Add source to start configuring your source transformation. 有关详细信息,请参阅源转换For more information, see Source transformation.

创作数据流Authoring data flows

映射数据流具有独特的创作画布,旨在简化生成转换逻辑。Mapping data flow has a unique authoring canvas designed to make building transformation logic easy. 数据流画布分为三个部分:顶部栏、图形和配置面板。The data flow canvas is separated into three parts: the top bar, the graph, and the configuration panel.

屏幕截图显示数据流画布,其中标记了顶部栏、图形和配置面板。Screenshot shows the data flow canvas with top bar, graph, and configuration panel labeled.

图形Graph

图形显示转换流。The graph displays the transformation stream. 它显示源数据流入一个或多个接收器时的沿袭。It shows the lineage of source data as it flows into one or more sinks. 若要添加新源,请选择“添加源”。To add a new source, select Add source. 若要添加新的转换,请选择现有转换右下方的加号。To add a new transformation, select the plus sign on the lower right of an existing transformation. 详细了解如何管理数据流图形Learn more on how to manage the data flow graph.

显示画布的图形部分,其中包含“搜索”文本框。

配置面板Configuration panel

配置面板显示特定于当前所选转换的设置。The configuration panel shows the settings specific to the currently selected transformation. 如果未选择任何转换,则会显示数据流。If no transformation is selected, it shows the data flow. 在整个数据流配置中,可以通过“参数”选项卡来添加参数。有关详细信息,请参阅映射数据流参数In the overall data flow configuration, you can add parameters via the Parameters tab. For more information, see Mapping data flow parameters.

每个转换至少包含四个配置选项卡。Each transformation contains at least four configuration tabs.

转换设置Transformation settings

每个转换的配置窗格中的第一个选项卡包含特定于该转换的设置。The first tab in each transformation's configuration pane contains the settings specific to that transformation. 有关详细信息,请参阅转换的文档页。For more information, see that transformation's documentation page.

源设置选项卡Source settings tab

优化Optimize

“优化”选项卡包含用于配置分区方案的设置。The Optimize tab contains settings to configure partitioning schemes. 若要详细了解如何优化数据流,请参阅映射数据流性能指南To learn more about how to optimize your data flows, see the mapping data flow performance guide.

屏幕截图显示“优化”选项卡,其中包含“分区选项”、“分区类型”和“分区数”。

检查Inspect

可以通过“检查”选项卡了解正在转换的数据流的元数据。The Inspect tab provides a view into the metadata of the data stream that you're transforming. 可以看到列计数、更改的列、添加的列、数据类型、列排序以及列引用。You can see column counts, the columns changed, the columns added, data types, the column order, and column references. “检查”视图是针对元数据的只读视图。Inspect is a read-only view of your metadata. 不需启用调试模式即可在“检查”窗格中查看元数据。You don't need to have debug mode enabled to see metadata in the Inspect pane.

检查Inspect

以转换方式更改数据的形状时,可以在“检查”窗格中查看元数据更改流。As you change the shape of your data through transformations, you'll see the metadata changes flow in the Inspect pane. 如果源转换中没有定义的架构,则元数据将在“检查”窗格中不可见。If there isn't a defined schema in your source transformation, then metadata won't be visible in the Inspect pane. 在架构偏差场景中,缺少元数据是很常见的。Lack of metadata is common in schema drift scenarios.

数据预览Data preview

如果开启了调试模式,则“数据预览”选项卡将在每次转换时提供数据的交互式快照。If debug mode is on, the Data Preview tab gives you an interactive snapshot of the data at each transform. 有关详细信息,请参阅调试模式下的数据预览For more information, see Data preview in debug mode.

上栏Top bar

顶部栏包含影响整个数据流的操作,如保存和验证。The top bar contains actions that affect the whole data flow, like saving and validation. 还可以查看转换逻辑的基础 JSON 代码和数据流脚本。You can view the underlying JSON code and data flow script of your transformation logic as well. 有关详细信息,请参阅数据流脚本For more information, learn about the data flow script.

可用转换Available transformations

查看映射数据流转换概述以获取可用转换的列表。View the mapping data flow transformation overview to get a list of available transformations.

数据流活动Data flow activity

映射数据流是使用数据流活动在 ADF 管道内进行操作化的。Mapping data flows are operationalized within ADF pipelines using the data flow activity. 用户需要做的就是指定要使用的集成运行时并传入参数值。All a user has to do is specify which integration runtime to use and pass in parameter values. 有关详细信息,请参阅 Azure 集成运行时For more information, learn about the Azure integration runtime.

调试模式Debug mode

使用调试模式可以在生成和调试数据流时以交互方式查看每个转换步骤的结果。Debug mode allows you to interactively see the results of each transformation step while you build and debug your data flows. 生成数据流逻辑和使用数据流活动运行管道调试运行时,都可以使用调试会话。The debug session can be used both in when building your data flow logic and running pipeline debug runs with data flow activities. 若要了解详细信息,请参阅调试模式文档To learn more, see the debug mode documentation.

监视数据流Monitoring data flows

映射数据流与现有的 Azure 数据工厂监视功能集成。Mapping data flow integrates with existing Azure Data Factory monitoring capabilities. 若要了解如何掌握数据流监视输出,请参阅监视映射数据流To learn how to understand data flow monitoring output, see monitoring mapping data flows.

Azure 数据工厂团队已创建性能优化指南,可帮助你在生成业务逻辑后优化数据流的执行时间。The Azure Data Factory team has created a performance tuning guide to help you optimize the execution time of your data flows after building your business logic.

后续步骤Next steps