将管道升级到 SDK v2

在 SDK v2 中，流水线被整合为作业。

作业有不同类型。大多数作业都是运行 command 的命令作业，例如 python main.py。作业中运行的内容与任何编程语言无关，因此你可以运行 bash 脚本、调用 python 解释器、运行一组 curl 命令或其他任何内容。

A pipeline 是另一种类型的任务，它定义了可能存在输入/输出关系的子任务，形成有向无环图（DAG）。

若要升级，请更改用于定义管道并将其提交到 SDK v2 的代码。无需将子作业中运行的内容升级到 SDK v2。但是，请从模型训练脚本中删除特定于Azure Machine Learning的任何代码。这种分离便于更轻松地在本地和云之间进行转换，并且被认为是成熟 MLOps 的最佳做法。在实践中，此最佳做法意味着删除 azureml.* 代码行。将模型日志记录和跟踪代码替换为 MLflow。有关详细信息，请参阅如何在 v2 中使用 MLflow。

本文比较了 SDK v1 和 SDK v2 中的方案。在以下示例中，将三个步骤（训练、评分和评估）构建为虚拟管道作业。此比较演示如何使用 SDK v1 和 SDK v2 生成管道作业，以及如何在步骤之间使用数据和传输数据。

运行管道

SDK v1

重要

自 2025 年 3 月 31 日起，Azure Machine Learning SDK v1（azureml-core）已弃用。支持于 2026 年 6 月 30 日结束。以下代码仅显示用于比较。使用 SDK v2 示例进行新工作。有关详细信息，请参阅升级 v2。

# import required libraries
import os
import azureml.core
from azureml.core import (
    Workspace,
    Dataset,
    Datastore,
    ComputeTarget,
    Experiment,
    ScriptRunConfig,
)
from azureml.pipeline.steps import PythonScriptStep
from azureml.pipeline.core import Pipeline

# check core SDK version number
print("Azure Machine Learning SDK Version: ", azureml.core.VERSION)

# load workspace
workspace = Workspace.from_config()
print(
    "Workspace name: " + workspace.name,
    "Azure region: " + workspace.location,
    "Subscription id: " + workspace.subscription_id,
    "Resource group: " + workspace.resource_group,
    sep="\n",
)

# create an ML experiment
experiment = Experiment(workspace=workspace, name="train_score_eval_pipeline")

# create a directory
script_folder = "./src"

# create compute
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your CPU cluster
amlcompute_cluster_name = "cpu-cluster"

# Verify that cluster does not exist already
try:
    aml_compute = ComputeTarget(workspace=workspace, name=amlcompute_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS12_V2',
                                                           max_nodes=4)
    aml_compute = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)

aml_compute.wait_for_completion(show_output=True)

# define data set
data_urls = ["wasbs://demo@dprepdata.blob.core.chinacloudapi.cn/Titanic.csv"]
input_ds = Dataset.File.from_files(data_urls)

# define steps in pipeline
from azureml.data import OutputFileDatasetConfig
model_output = OutputFileDatasetConfig('model_output')
train_step = PythonScriptStep(
    name="train step",
    script_name="train.py",
    arguments=['--training_data', input_ds.as_named_input('training_data').as_mount() ,'--max_epocs', 5, '--learning_rate', 0.1,'--model_output', model_output],
    source_directory=script_folder,
    compute_target=aml_compute,
    allow_reuse=True,
)

score_output = OutputFileDatasetConfig('score_output')
score_step = PythonScriptStep(
    name="score step",
    script_name="score.py",
    arguments=['--model_input',model_output.as_input('model_input'), '--test_data', input_ds.as_named_input('test_data').as_mount(), '--score_output', score_output],
    source_directory=script_folder,
    compute_target=aml_compute,
    allow_reuse=True,
)

eval_output = OutputFileDatasetConfig('eval_output')
eval_step = PythonScriptStep(
    name="eval step",
    script_name="eval.py",
    arguments=['--scoring_result',score_output.as_input('scoring_result'), '--eval_output', eval_output],
    source_directory=script_folder,
    compute_target=aml_compute,
    allow_reuse=True,
)

# built pipeline
from azureml.pipeline.core import Pipeline

pipeline_steps = [train_step, score_step, eval_step]

pipeline = Pipeline(workspace = workspace, steps=pipeline_steps)
print("Pipeline is built.")

pipeline_run = experiment.submit(pipeline, regenerate_outputs=False)

print("Pipeline submitted for execution.")

SDK v2。完整的示例链接

# import required libraries
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

from azure.ai.ml import MLClient, Input
from azure.ai.ml.dsl import pipeline

try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.chinacloudapi.cn/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

# Get a handle to workspace
ml_client = MLClient.from_config(credential=credential)

# Retrieve an already attached Azure Machine Learning Compute.
cluster_name = "cpu-cluster"
print(ml_client.compute.get(cluster_name))
# Tip: You can skip provisioning a cluster by using serverless compute.
# Replace `default_compute=cluster_name` with `default_compute="serverless"`
# in the @pipeline decorator below. See: /machine-learning/how-to-use-serverless-compute

# Import components that are defined with Python function
with open("src/components.py") as fin:
    print(fin.read())

# You need to install mldesigner package to use command_component decorator.
# Option 1: install directly
# !pip install mldesigner

# Option 2: install as an extra dependency of azure-ai-ml
# !pip install azure-ai-ml[designer]

# import the components as functions
from src.components import train_model, score_data, eval_model

cluster_name = "cpu-cluster"
# define a pipeline with component
@pipeline(default_compute=cluster_name)
def pipeline_with_python_function_components(input_data, test_data, learning_rate):
    """E2E dummy train-score-eval pipeline with components defined via Python function components"""

    # Call component obj as function: apply given inputs & parameters to create a node in pipeline
    train_with_sample_data = train_model(
        training_data=input_data, max_epochs=5, learning_rate=learning_rate
    )

    score_with_sample_data = score_data(
        model_input=train_with_sample_data.outputs.model_output, test_data=test_data
    )

    eval_with_sample_data = eval_model(
        scoring_result=score_with_sample_data.outputs.score_output
    )

    # Return: pipeline outputs
    return {
        "eval_output": eval_with_sample_data.outputs.eval_output,
        "model_output": train_with_sample_data.outputs.model_output,
    }


pipeline_job = pipeline_with_python_function_components(
    input_data=Input(
        path="wasbs://demo@dprepdata.blob.core.chinacloudapi.cn/Titanic.csv", type="uri_file"
    ),
    test_data=Input(
        path="wasbs://demo@dprepdata.blob.core.chinacloudapi.cn/Titanic.csv", type="uri_file"
    ),
    learning_rate=0.1,
)

# submit job to workspace
pipeline_job = ml_client.jobs.create_or_update(
    pipeline_job, experiment_name="train_score_eval_pipeline"
)

SDK v1 和 SDK v2 中关键功能的映射

SDK v1 中的功能	SDK v2 中的粗略映射
azureml.pipeline.core.Pipeline	azure.ai.ml.dsl.pipeline
OutputDatasetConfig	输出
数据集as_mount	输入
StepSequence	数据依赖

阶段和任务/组件类型映射

SDK v1 中的步骤	SDK v2 中的作业类型	SDK v2 中的组件类型
`adla_step`	无	无
`automl_step`	`automl` 作业	`automl` 组件
`azurebatch_step`	无	无
`command_step`	`command` 任务	`command` 组件
`data_transfer_step`	无	无
`databricks_step`	无	无
`estimator_step`	`command` 任务	`command` 组件
`hyper_drive_step`	`sweep` 任务	无
`kusto_step`	无	无
`module_step`	无	`command` 组件
`mpi_step`	`command` 任务	`command` 组件
`parallel_run_step`	`Parallel` 作业	`Parallel` 组件
`python_script_step`	`command` 任务	`command` 组件
`r_script_step`	`command` 任务	`command` 组件
`synapse_spark_step`	`spark` 任务	`spark` 组件

有关详细信息，请参阅以下文档：

Last updated on 2026-04-22

将管道升级到 SDK v2

运行管道

SDK v1 和 SDK v2 中关键功能的映射

阶段和任务/组件类型映射

相关文档

其他资源