“应用转换”模块Apply Transformation module

本文介绍 Azure 机器学习设计器中的一个模块。This article describes a module in Azure Machine Learning designer.

使用此模块来修改基于之前计算的转换的输入数据集。Use this module to modify an input dataset based on a previously computed transformation. 如果需要更新推理管道中的转换,则此模块是必需的。This module is necessary in if you need to update transformations in inference pipelines.

例如,如果使用了 z 分数通过“规范化数据”模块来规范化训练数据,则可能还需要使用在评分阶段为训练计算的 z 分数值。For example, if you used z-scores to normalize your training data by using the Normalize Data module, you would want to use the z-score value that was computed for training during the scoring phase as well. 在 Azure 机器学习中,可以将规范化方法另存为转换,然后使用“应用转换”在评分之前将 z 分数应用于输入数据。In Azure Machine Learning, you can save the normalization method as a transform, and then using Apply Transformation to apply the z-score to the input data before scoring.

如何保存转换How to save transformations

设计器允许你将数据转换保存为“数据集”,以便在其他管道中使用它们。The designer lets you save data transformations as datasets so that you can use them in other pipelines.

  1. 选择已成功运行的数据转换模块。Select a data transformation module that has successfully run.

  2. 选择“输出 + 日志”选项卡。Select the Outputs + logs tab.

  3. 找到转换输出,然后选择“注册数据集”,将其另存为“模块”面板中“数据集”类别下的模块 。Find the transformation output, and select the Register dataset to save it as a module under Datasets category in the module palette.

如何使用“应用转换”How to use Apply Transformation

  1. 应用转换 模块添加到管道。Add the Apply Transformation module to your pipeline. 可以在“模块”面板的“模型评分和评估”部分中找到此模块。You can find this module in the Model Scoring & Evaluation section of the module palette.

  2. 在“模块”面板的“数据集”下找到要使用的已保存转换。Find the saved transformation you want to use under Datasets in the module palette.

  3. 将已保存转换的输出连接到“应用转换”模块的左侧输入端口。Connect the output of the saved transformation to the left input port of the Apply Transformation module.

    数据集应具有与首次为其设计了转换的数据集相同的架构(列数、列名、数据类型)。The dataset should have exactly the same schema (number of columns, column names, data types) as the dataset for which the transformation was first designed.

  4. 将所需模块的数据集输出连接到“应用转换”模块的右侧输入端口。Connect the dataset output of the desired module to the right input port of the Apply Transformation module.

  5. 若要将转换应用于新数据集,请提交管道。To apply a transformation to the new dataset, submit the pipeline.

重要

若要确保在训练管道中更新的转换也适用于推理管道,你需要在训练管道中每次有更新转换时执行以下步骤:To make sure the updated transformation in training pipelines is also feasible in inference pipelines, you need to follow the steps below each time there is updated transformation in the training pipeline:

  1. 在训练管道中,将 Select Columns Transform 的输出注册为数据集。In the training pipeline, register the output of the Select Columns Transform as a dataset. 注册模块输出的数据集Register dataset of module output
  2. 在推理管道中删除 TD 模块,并将其替换为上一步中已注册的数据集。In the inference pipeline, remove the TD- module, and replace it with the registered dataset in the previous step. 替换 TD 模块Replace TD module

后续步骤Next steps

请参阅 Azure 机器学习的可用模块集See the set of modules available to Azure Machine Learning.