使用 SDK 和 CLI 设置 AutoML 以训练时序预测模型

适用范围：Azure CLI ml 扩展 v2（最新版）Python SDK azure-ai-ml v2（最新版）

Azure 机器学习中的自动化机器学习（AutoML）使用标准机器学习模型和已知的时序模型来创建预测。这种方法结合了有关目标变量的历史信息，以及输入数据中用户提供的特征和自动设计的特征。模型搜索算法可帮助识别预测准确度最佳的模型。有关详细信息，请参阅预测方法和模型扫描和选择。

本文介绍如何使用 Azure 机器学习 Python SDK 和 Azure CLI 为机器学习进行时序预测设置 AutoML。此过程包括准备用于训练的数据并在预测作业（类参考）中配置时序参数。然后，使用组件和管道对模型进行训练、推理和评估。

如需低代码体验，请参阅教程：使用自动化机器学习预测需求。本教程提供了在 Azure 机器学习工作室中使用 AutoML 的时序预测示例。

先决条件

Azure 机器学习工作区。有关详细信息，请参阅创建工作区资源。
能够启动 AutoML 训练作业。有关详细信息，请参阅使用 Azure 机器学习 CLI 和 Python SDK 为表格数据设置 AutoML 训练。

准备训练和验证数据

用于 AutoML 预测的输入数据必须包含表格格式的有效时序。每个变量在数据表中必须具有其自己的对应列。 AutoML 至少需要两列：表示时间轴的“时间”列和表示要预测的数量的“目标”列。其他列可用作预测器。有关详细信息，请参阅 AutoML 如何使用数据。

重要

为预测未来值训练模型时，请确保在针对预期地平线运行预测时还可以使用训练中使用的所有功能。

考虑当前股票价格的功能，这可能会提高训练准确性。如果使用长距离预测，则可能无法准确预测对应于未来时序点的未来股票值。这种方法可能会降低模型准确度。

AutoML 预测作业要求将训练数据表示为 MLTable 对象。 MLTable 对象指定了数据源和加载数据的步骤。有关详细信息和应用实例，请参阅 “使用表”。

对于以下示例，假设训练数据包含在本地目录中的 CSV 文件中： ./train_data/timeseries_train.csv。

小窍门

有关示例数据集的完整工作示例，请参阅 Azure 机器学习示例存储库中的 AutoML 预测示例笔记本。能源需求预测笔记本包括可用于跟踪的示例训练数据。

Python SDK
Azure CLI

可以通过使用MLTable来创建对象：

import mltable

paths = [
    {'file': './train_data/timeseries_train.csv'}
]

train_table = mltable.from_delimited_files(paths)
train_table.save('./train_data')

此代码创建包含文件格式和加载指令的新文件 ./train_data/MLTable。

若要启动训练作业，请使用 Python SDK 定义输入数据对象：

from azure.ai.ml.constants import AssetTypes
from azure.ai.ml import Input

# Training MLTable defined locally, with local data to be uploaded.
my_training_data_input = Input(
    type=AssetTypes.MLTABLE, path="./train_data"
)

可以通过将以下 YAML 代码片段复制到新文件 MLTable 来定义新对象。

$schema: https://azuremlschemas.azureedge.net/latest/MLTable.schema.json

type: mltable
paths:
    - file: ./timeseries_train.csv

transformations:
    - read_delimited:
        delimiter: ','
        encoding: ascii

开始为 AutoML 作业生成 YAML 配置，并指定训练数据，如以下示例所示：

$schema: https://azuremlschemas.azureedge.net/latest/autoMLForecastingJob.schema.json
type: automl

experiment_name: cli-v2-automl-forecasting-job
description: A time-series forecasting AutoML job
task: forecasting

# Training data MLTable for the AutoML job.
training_data:
    path: "./train_data"
    type: mltable

validation_data:
    # Optional validation data.

compute: # Compute for training job.
primary_metric: # Primary metric.  

target_column_name: # Target column name.
n_cross_validations: # Cross-validation setting.

limits:
    # Limit settings.

forecasting:
    # Forecasting-specific settings.

training:
    # Training settings.

在文章后面的部分，您将对该配置添加更多详细信息。在此示例中，该位置为 ./automl-forecasting-job.yml。

以类似的方式指定验证数据。创建 MLTable 对象并指定验证数据输入。或者，如果未提供验证数据，AutoML 将自动根据训练数据创建交叉验证拆分，以用于模型选择。有关更多信息，请参阅以下资源：

创建计算以运行试验

AutoML 使用完全托管的计算资源 Azure 机器学习计算来运行训练作业。

Python SDK
Azure CLI

以下示例创建了一个名为 cpu-cluster 的计算群集。

from azure.ai.ml.entities import AmlCompute

# specify aml compute name.
cpu_compute_target = "cpu-cluster"

try:
    ml_client.compute.get(cpu_compute_target)
except Exception:
    print("Creating a new cpu compute target...")
    compute = AmlCompute(
        name=cpu_compute_target, size="STANDARD_D2_V2", min_instances=0, max_instances=4
    )
    ml_client.compute.begin_create_or_update(compute).result()

使用以下 Azure CLI 命令创建名为 cpu-compute 的新计算：

az ml compute create -n cpu-compute --type amlcompute --min-instances 0 --max-instances 4

引用作业定义中的计算，如以下示例所示：

$schema: https://azuremlschemas.azureedge.net/latest/autoMLForecastingJob.schema.json
type: automl

experiment_name: cli-v2-automl-forecasting-job
description: A time-series forecasting AutoML job
task: forecasting

# Set training data MLTable for the AutoML job.
training_data:
    path: "./train_data"
    type: mltable

# Set compute for the training job to use. 
compute: azureml:cpu-compute

primary_metric: # Primary metric.  

target_column_name: # Target column name.
n_cross_validations: # Cross-validation setting.

limits:
    # Limit settings.

forecasting:
    # Forecasting-specific settings.

training:
    # Training settings.

配置试验

以下示例演示如何配置试验。

Python SDK
Azure CLI

使用 AutoML 工厂函数在 Python SDK 中配置预测作业。以下示例演示如何通过设置主要指标和对训练运行设置限制来创建预测作业：

from azure.ai.ml import automl

# Set forecasting variables.
# As needed, modify the variable values to run the snippet successfully.
forecasting_job = automl.forecasting(
    compute="cpu-cluster",
    experiment_name="sdk-v2-automl-forecasting-job",
    training_data=my_training_data_input,
    target_column_name=target_column_name,
    primary_metric="normalized_root_mean_squared_error",
    n_cross_validations="auto",
)

# Set optional limits.
forecasting_job.set_limits(
    timeout_minutes=120,
    trial_timeout_minutes=30,
    max_concurrent_trials=4,
)

配置 AutoML 作业的常规属性，包括：

主要指标。
训练数据中目标列的名称。
交叉验证设置。
作业的资源限制。

有关详细信息，请参阅预测命令作业 YAML 架构、训练参数以及限制。

$schema: https://azuremlschemas.azureedge.net/latest/autoMLForecastingJob.schema.json
type: automl

experiment_name: cli-v2-automl-forecasting-job
description: A time-series forecasting AutoML job
task: forecasting

training_data:
    path: "./train_data"
    type: mltable

compute: azureml:cpu-compute

# Settings for primary metric, target/label column name, cross validation.
primary_metric: normalized_root_mean_squared_error
target_column_name: <target_column_name>
n_cross_validations: auto

# Settings for training job limits on time, concurrency, and others.
limits:
    timeout_minutes: 120
    trial_timeout_minutes: 30
    max_concurrent_trials: 4

forecasting:
    # Forecasting-specific settings.

training:
    # Training settings.

预测作业设置

预测任务具有许多特定于预测的设置。在这些设置中，最基本的是训练数据中时间列的名称以及预测范围。

Python SDK
Azure CLI

使用 ForecastingJob 方法配置以下设置：

# Forecasting-specific configuration.
forecasting_job.set_forecast_settings(
    time_column_name=time_column_name,
    forecast_horizon=24
)

在作业的 YAML 配置中的 forecasting 部分配置这些设置：

$schema: https://azuremlschemas.azureedge.net/latest/autoMLForecastingJob.schema.json
type: automl

experiment_name: cli-v2-automl-forecasting-job
description: A time-series forecasting AutoML job
task: forecasting

training_data:
    path: "./train_data"
    type: mltable

compute: azureml:cpu-compute

primary_metric: normalized_root_mean_squared_error
target_column_name: <target_column_name>
n_cross_validations: auto

limits:
    timeout_minutes: 120
    trial_timeout_minutes: 30
    max_concurrent_trials: 4

# Forecasting-specific settings.
# Set the horizon to 24 for this example. The horizon generally depends on the business scenario.
forecasting:
    time_column_name: <time_column_name>
    forecast_horizon: 24

training:
    # Training settings.

时间列名称是必需设置。通常，根据预测方案设置预测范围。如果数据包含多个时序，请指定 时序 ID 列的名称。对这些列进行分组时，它们定义单个序列。例如，假设数据包含不同商店和品牌的每小时销售额。下面的示例演示如何设置时序 ID 列，假设数据包含名为 store 和 brand 的列：

Python SDK
Azure CLI

# Forecasting-specific configuration.
# Add time series IDs for store and brand.
forecasting_job.set_forecast_settings(
    ...,  # Other settings.
    time_series_id_column_names=['store', 'brand']
)

# Forecasting-specific settings.
# Add time series IDs for store and brand.
forecasting:
    # Other settings.
    time_series_id_column_names: ["store", "brand"]

如果未指定时间序列 ID 列，AutoML 会尝试在您的数据中自动检测这些列。

其他设置是可选的，如下部分所述。

可选的预测作业设置

可选配置可用于预测任务，例如启用深度学习和指定目标滚动窗口聚合。有关完整的参数列表，可查看参考文档。

模型搜索设置

两个可选设置控制 AutoML 搜索最佳模型的模型空间： allowed_training_algorithms 和 blocked_training_algorithms。若要将搜索空间限制为给定的模型类集，请使用 allowed_training_algorithms 参数，如以下示例所示：

Python SDK
Azure CLI

# Only search ExponentialSmoothing and ElasticNet models.
forecasting_job.set_training(
    allowed_training_algorithms=["ExponentialSmoothing", "ElasticNet"]
)

$schema: https://azuremlschemas.azureedge.net/latest/autoMLForecastingJob.schema.json
type: automl

experiment_name: cli-v2-automl-forecasting-job
description: A time-series forecasting AutoML job
task: forecasting

training_data:
    path: "./train_data"
    type: mltable

compute: azureml:cpu-compute

primary_metric: normalized_root_mean_squared_error
target_column_name: <target_column_name>
n_cross_validations: auto

limits:
    timeout_minutes: 120
    trial_timeout_minutes: 30
    max_concurrent_trials: 4

forecasting:
    time_column_name: <time_column_name>
    forecast_horizon: 24

# Training settings.
# Only search ExponentialSmoothing and ElasticNet models.
training:
    allowed_training_algorithms: ["ExponentialSmoothing", "ElasticNet"]
    # Other training settings.

在这种情况下，预测作业仅搜索指数平滑和弹性网络模型类。若要从搜索空间中删除一组给定的模型类，请使用 blocked_training_algorithms，如以下示例所示：

Python SDK
Azure CLI

# Search over all model classes except Prophet.
forecasting_job.set_training(
    blocked_training_algorithms=["Prophet"]
)

# Training settings.
# Search over all model classes except Prophet.
training:
    blocked_training_algorithms: ["Prophet"]
    # Other training settings.

作业将搜索除 Prophet 之外的所有模型类。有关 allowed_training_algorithms 和 blocked_training_algorithms 中接受的预测模型名的列表，请查看训练属性。可以将 allowed_training_algorithms 或 blocked_training_algorithms 应用到训练运行，但不能同时将这两者应用到训练运行。

启用深度神经网络学习

AutoML 随附了一个名为 TCNForecaster 的自定义深度神经网络 (DNN) 模型。此模型是一个时间卷积网络 (TCN)，可用于将常见的图像处理任务方法应用于时序建模。一维“因果”卷积构成了网络的主干，使模型能够在训练历史的长时间内学习复杂的模式。有关详细信息，请参阅 TCNForecaster 简介。

TCNForecaster 通常在训练历史记录中有数千个或更多观测值时，比标准时序模型实现更高的准确性。但是，由于其容量较高，训练和遍历 TCNForecaster 模型所需的时间更长。

可以通过在训练配置中设置 enable_dnn_training 标志来启用 AutoML 中的 TCNForecaster，如以下示例所示：

Python SDK
Azure CLI

# Include TCNForecaster models in the model search.
forecasting_job.set_training(
    enable_dnn_training=True
)

# Training settings.
# Include TCNForecaster models in the model search.
training:
    enable_dnn_training: true
    # Other training settings.

默认情况下，TCNForecaster 训练的限制是，对于每个模型试用，有一个计算节点和一个 GPU（如果可用）。对于大型数据方案，请通过多个核心、GPU 和节点分发每个 TCNForecaster 试用版。有关详细信息和代码示例，请参阅分布式训练。

若要为在 Azure 机器学习工作室中创建的 AutoML 试验启用 DNN，请参阅工作室 UI 文章中的任务类型设置。

注意

为使用 SDK 创建的试验启用 DNN 时，将禁用最佳模型说明。
Azure Databricks 中启动的运行不支持自动化机器学习中用于预测的 DNN 支持。
启用 DNN 训练时，请使用 GPU 计算类型。

延迟和滚动窗口特征

目标的最近值通常是预测模型中具有影响力的特征。因此，AutoML 可以创建时间滞后和滚动窗口聚合功能，以可能提高模型准确性。

考虑一个提供了天气数据和历史需求的能源需求预测方案。下表显示了在最近三小时内应用窗口聚合时生成的特征工程。最小值、最大值和总和的字段基于定义的设置在三小时滑动窗口中生成。例如，对于在 2017 年 9 月 8 日上午 4：00 有效的观察，使用 2017 年 9 月 8 日上午 1：00 到凌晨 3：00 的要求值 计算最大值、最小值和总和值。这个三小时的窗口将移位以填充其余行的数据。有关详细信息和示例，请参阅 AutoML 中时序预测的滞后功能。

可以通过设置滚动窗口大小和要创建的滞后顺序来为目标启用滞后和滚动窗口聚合功能。上一示例中的窗口大小为 3。还可以使用 feature_lags 设置为功能启用滞后时间。在以下示例中，将所有这些参数设置为 auto，以便指示 AutoML 通过分析数据的相关结构自动确定设置。

Python SDK
Azure CLI

forecasting_job.set_forecast_settings(
    ...,  # Other settings.
    target_lags='auto', 
    target_rolling_window_size='auto',
    feature_lags='auto'
)

# Forecasting-specific settings.
# Auto configure lags and rolling-window features.
forecasting:
    target_lags: auto
    target_rolling_window_size: auto
    feature_lags: auto
    # Other settings.

短时序处理

如果序列没有足够的数据点来执行模型开发的训练和验证阶段，AutoML 会将时 序视为短序列 。有关详细信息，请参阅训练数据长度要求。

AutoML 可对短时序执行多个操作。可以使用 short_series_handling_config 设置来配置这些操作。默认值为 auto。下表描述了这些设置：

设置	说明	备注
`auto`	短时序处理的默认值。	- 如果所有时序都是短时序，则填充数据。 - 如果并非所有时序都是短时序，则删除短时序。
`pad`	`short_series_handling_config = pad`如果使用此设置，AutoML 会将随机值添加到它找到的每个短序列。 AutoML 使用白噪声填充目标列。	可以使用以下带有指定填充的列类型： - 对象列 - 用 `NaN`s 填充。 - 数字列 - 填充 0 （零）。 - 布尔/逻辑列 - 使用 `False` 填充。
`drop`	如果使用此设置 `short_series_handling_config = drop` ，AutoML 将删除短系列。它不使用短系列进行训练或预测。	对这些时序的预测会返回 `NaN`。
`None`	不会填充或删除任何时序。

以下示例设置短序列处理，以便将所有短序列填充至最小要求长度：

Python SDK
Azure CLI

forecasting_job.set_forecast_settings(
    ...,  # Other settings.
    short_series_handling_config='pad'
)

# Forecasting-specific settings.
# Auto configure lags and rolling-window features.
forecasting:
    short_series_handling_config: pad
    # Other settings.

注意

填充可能会影响生成的模型的准确性，因为它引入了人工数据以避免训练失败。如果许多时序都是短时序，你可能还会在可解释性结果中看到一些影响。

频率和目标数据聚合

使用频率和数据聚合选项避免不规则数据导致的故障。如果数据不遵循设定的时间节奏（例如每小时或每天），则数据是不规则的。销售点数据是一个很好的不规则数据示例。在这些情况下，AutoML 可以按所需的频率聚合数据，然后根据聚合生成预测模型。

设置frequency和target_aggregate_function选项以处理不规则数据。频率选项接受 Pandas DateOffset 字符串作为输入。下表显示聚合函数支持的值：

函数	说明
`sum`	求目标值的总和
`mean`	目标值的平均值
`min`	求目标的最小值
`max`	求目标的最大值

AutoML 对以下列应用聚合：

列	聚合方法
数值预测器	AutoML 使用 `sum`、`mean`、`min` 和 `max` 函数。它生成新的列。每个列名都包含一个后缀，用于标识应用于列值的聚合函数的名称。
分类预测器	AutoML 使用 `forecast_mode` 参数的值来聚合数据。它是窗口中最突出的类别。有关详细信息，请参阅多模型管道和 HTS 管道部分中参数的说明。
数据预测器	AutoML 使用最小目标值 (`min`)、最大目标值 (`max`) 和 `forecast_mode` 参数设置来聚合数据。
目标	AutoML 根据指定的操作聚合值。通常，`sum` 函数适用于大多数方案。

以下示例将频率设置为每小时，并将聚合函数设置为 summation：

Python SDK
Azure CLI

# Aggregate the data to hourly frequency.
forecasting_job.set_forecast_settings(
    ...,  # Other settings.
    frequency='H',
    target_aggregate_function='sum'
)

# Forecasting-specific settings.
# Auto-configure lags and rolling-window features.
forecasting:
    frequency: H
    target_aggregate_function: sum
    # Other settings.

自定义交叉验证设置

有两个可自定义的设置用于控制预测作业中的交叉验证。使用 n_cross_validations 参数自定义折叠数。配置 cv_step_size 参数以定义折叠之间的时间偏移量。有关详细信息，请参阅预测模型选择。

默认情况下，AutoML 会根据数据的特征自动设置这两个设置。高级用户可能想要手动设置它们。例如，假设你有每日销售数据，并且希望验证设置包含五个折叠，而相邻折叠之间的偏移量为 7 天。下面的代码示例演示了如何设置这些值：

Python SDK
Azure CLI

from azure.ai.ml import automl

# Create a job with five CV folds.
forecasting_job = automl.forecasting(
    ...,  # Other training parameters.
    n_cross_validations=5,
)

# Set the step size between folds to seven days.
forecasting_job.set_forecast_settings(
    ...,  # Other settings.
    cv_step_size=7
)

$schema: https://azuremlschemas.azureedge.net/latest/autoMLForecastingJob.schema.json
type: automl

experiment_name: cli-v2-automl-forecasting-job
description: A time-series forecasting AutoML job
task: forecasting

training_data:
    path: "./train_data"
    type: mltable

compute: azureml:cpu-compute

primary_metric: normalized_root_mean_squared_error
target_column_name: <target_column_name>
n_cross_validations: auto

# Use five CV folds.
n_cross_validations: 5

# Set the step size between folds to seven days.
forecasting:
    cv_step_size: 7
    # Other settings.

limits:
    # Limit settings.

training:
    # Training settings.

自定义特征化

默认情况下，AutoML 使用工程特征扩充训练数据，以提高模型的准确度。有关详细信息，请参阅自动化功能工程。可以通过使用预测作业的特征化配置，自定义某些预处理步骤。

下表列出了支持的预测自定义项：

自定义	说明	选项
列用途更新	重写指定列的自动检测到的特征类型。	`categorical`，`dateTime`，`numeric`
转换器参数更新	更新指定的 imputer 的参数。	`{"strategy": "constant", "fill_value": <value>}`，`{"strategy": "median"}`，`{"strategy": "ffill"}`

例如，假设你有一个零售需求场景，其中数据包括价格、on sale 标志和产品类型。以下示例演示了如何为这些功能设置自定义类型和 imputer：

Python SDK
Azure CLI

from azure.ai.ml.automl import ColumnTransformer

# Customize imputation methods for price and is_on_sale features.
# Median value imputation for price, constant value of zero for is_on_sale.
transformer_params = {
    "imputer": [
        ColumnTransformer(fields=["price"], parameters={"strategy": "median"}),
        ColumnTransformer(fields=["is_on_sale"], parameters={"strategy": "constant", "fill_value": 0}),
    ],
}

# Set the featurization.
# Ensure product_type feature is interpreted as categorical.
forecasting_job.set_featurization(
    mode="custom",
    transformer_params=transformer_params,
    column_name_and_types={"product_type": "Categorical"},
)

$schema: https://azuremlschemas.azureedge.net/latest/autoMLForecastingJob.schema.json
type: automl

experiment_name: cli-v2-automl-forecasting-job
description: A time-series forecasting AutoML job
task: forecasting

training_data:
    path: "./train_data"
    type: mltable

compute: azureml:cpu-compute

primary_metric: normalized_root_mean_squared_error
target_column_name: <target_column_name>
n_cross_validations: auto

# Customize imputation methods for price and is_on_sale features.
# Median value imputation for price, constant value of zero for is_on_sale.
featurization:
    mode: custom
    column_name_and_types:
        product_type: Categorical
    transformer_params:
        imputer:
            - fields: ["price"]
            parameters:
                strategy: median
            - fields: ["is_on_sale"]
            parameters:
                strategy: constant
                fill_value: 0

forecasting:
    # Forecasting-specific settings.

limits:
    # Limit settings.

training:
    # Training settings.

如果使用 Azure 机器学习工作室进行试验，请参阅在工作室中配置特征化设置。

提交预测任务

配置所有设置后，即可运行预测作业。下面的示例对此过程进行了演示。

Python SDK
Azure CLI

# Submit the AutoML job.
returned_job = ml_client.jobs.create_or_update(
    forecasting_job
)

print(f"Created job: {returned_job}")

# Get a URL for the job in the studio UI.
returned_job.services["Studio"].endpoint

在以下 Azure CLI 命令中，作业 YAML 配置位于路径 ./automl-forecasting-job.yml的当前工作目录中。如果从其他目录运行命令，请相应地更改路径。

run_id=$(az ml job create --file automl-forecasting-job.yml)

使用存储的运行 ID 返回有关作业的信息。该 --web 参数将打开 Azure 机器学习工作室 Web UI，可在其中查看有关作业的详细信息：

az ml job show -n $run_id --web

提交作业后，AutoML 会预配计算资源，对输入数据应用特征化和其他准备步骤，然后开始扫描预测模型。有关详细信息，请参阅 AutoML 中的预测方法，以及 AutoML 中预测的模型扫描和选择。

使用组件和管道协调训练、推理和评估

机器学习工作流可能需要不仅仅是训练。可在 Azure 机器学习中与训练作业一起协调的其他常见任务包括推理或检索对较新数据的模型预测，以及基于具有已知目标值的测试集评估模型准确度。为了支持推理和评估任务，Azure 机器学习提供了组件，这些组件是独立的代码片段，用于在 Azure 机器学习管道中执行一个步骤。

Python SDK
Azure CLI

以下示例从客户端注册表检索组件代码：

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

# Get credential to access azureml registry.
try:
    credential = DefaultAzureCredential()
    # Check if token can be obtained successfully.
    credential.get_token("https://management.chinacloudapi.cn/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential fails.
    credential = InteractiveBrowserCredential()

# Create client to access assets in azureml-preview registry.
ml_client_registry = MLClient(
    credential=credential,
    registry_name="azureml-preview"
)

# Create client to access assets in azureml registry.
ml_client_metrics_registry = MLClient(
    credential=credential,
    registry_name="azureml"
)

# Get inference component from registry.
inference_component = ml_client_registry.components.get(
    name="automl_forecasting_inference",
    label="latest"
)

# Get component to compute evaluation metrics from registry.
compute_metrics_component = ml_client_metrics_registry.components.get(
    name="compute_metrics",
    label="latest"
)

接下来，定义一个工厂函数，以便创建用于协调训练、推理和指标计算的管道。有关详细信息，请参阅 “配置试验”。

from azure.ai.ml import automl
from azure.ai.ml.constants import AssetTypes
from azure.ai.ml.dsl import pipeline

@pipeline(description="AutoML Forecasting Pipeline")
def forecasting_train_and_evaluate_factory(
    train_data_input,
    test_data_input,
    target_column_name,
    time_column_name,
    forecast_horizon,
    primary_metric='normalized_root_mean_squared_error',
    cv_folds='auto'
):
    # Configure training node of pipeline.
    training_node = automl.forecasting(
        training_data=train_data_input,
        target_column_name=target_column_name,
        primary_metric=primary_metric,
        n_cross_validations=cv_folds,
        outputs={"best_model": Output(type=AssetTypes.MLFLOW_MODEL)},
    )

    training_node.set_forecasting_settings(
        time_column_name=time_column_name,
        forecast_horizon=max_horizon,
        frequency=frequency,
        # Other settings.
        ... 
    )
    
    training_node.set_training(
        # Training parameters.
        ...
    )
    
    training_node.set_limits(
        # Limit settings.
        ...
    )

    # Configure inference node to make rolling forecasts on test set.
    inference_node = inference_component(
        test_data=test_data_input,
        model_path=training_node.outputs.best_model,
        target_column_name=target_column_name,
        forecast_mode='rolling',
        step=1
    )

    # Configure metrics calculation node.
    compute_metrics_node = compute_metrics_component(
        task="tabular-forecasting",
        ground_truth=inference_node.outputs.inference_output_file,
        prediction=inference_node.outputs.inference_output_file,
        evaluation_config=inference_node.outputs.evaluation_config_output_file
    )

    # Return dictionary with evaluation metrics and raw test set forecasts.
    return {
        "metrics_result": compute_metrics_node.outputs.evaluation_result,
        "rolling_fcst_result": inference_node.outputs.inference_output_file
    }

定义本地文件夹 ./train_data 和 ./test_data 中包含的训练和测试数据输入。

my_train_data_input = Input(
    type=AssetTypes.MLTABLE,
    path="./train_data"
)

my_test_data_input = Input(
    type=AssetTypes.URI_FOLDER,
    path='./test_data',
)

最后，构造管道，设置其默认计算并提交作业：

pipeline_job = forecasting_train_and_evaluate_factory(
    my_train_data_input,
    my_test_data_input,
    target_column_name,
    time_column_name,
    forecast_horizon
)

# Set pipeline-level compute.
pipeline_job.settings.default_compute = compute_name

# Submit pipeline job.
returned_pipeline_job = ml_client.jobs.create_or_update(
    pipeline_job,
    experiment_name=experiment_name
)
returned_pipeline_job

$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline

description: AutoML Forecasting Pipeline
experiment_name: cli-v2-automl-forecasting-pipeline

# Set default compute for pipeline steps.
settings:
    default_compute: cpu-compute

# Pipeline inputs.
inputs:
    train_data_input:
        type: mltable
        path: "./train_data"
    test_data_input:
        type: uri_folder
        path: "./test_data"
    target_column_name: <target column name>
    time_column_name: <time column name>
    forecast_horizon: <forecast horizon>
    primary_metric: normalized_root_mean_squared_error
    cv_folds: auto

# Set pipeline outputs.
# Output the evaluation metrics and raw test set rolling forecasts.
outputs: 
    metrics_result:
        type: uri_file
        mode: upload
    rolling_fcst_result:
        type: uri_file
        mode: upload

jobs:
  # Configure automl training node of pipeline.
    training_node:
        type: automl
        task: forecasting
        primary_metric: ${{parent.inputs.primary_metric}}
        target_column_name: ${{parent.inputs.target_column_name}}
        training_data: ${{parent.inputs.train_data_input}}
        n_cross_validations: ${{parent.inputs.cv_folds}}
        training:
            # Training settings.
        forecasting:
            time_column_name: ${{parent.inputs.time_column_name}}
            forecast_horizon: ${{parent.inputs.forecast_horizon}}
            # Other forecasting-specific settings.
        limits:
            # Limit settings.
        outputs:
            best_model:
                type: mlflow_model

    # Configure inference node to make rolling forecasts on test set.
    inference_node:
        type: command
        component: azureml://registries/azureml-preview/components/automl_forecasting_inference
        inputs:
            target_column_name: ${{parent.inputs.target_column_name}}
            forecast_mode: rolling
            step: 1
            test_data: ${{parent.inputs.test_data_input}}
            model_path: ${{parent.jobs.training_node.outputs.best_model}}
        outputs:
            inference_output_file: ${{parent.outputs.rolling_fcst_result}}
            evaluation_config_output_file:
                type: uri_file

    # Configure metrics calculation node.
    compute_metrics:
        type: command
        component: azureml://registries/azureml/compute_metrics
        inputs:
            task: "tabular-forecasting"
            ground_truth: ${{parent.jobs.inference_node.outputs.inference_output_file}}
            prediction: ${{parent.jobs.inference_node.outputs.inference_output_file}}
            evaluation_config: ${{parent.jobs.inference_node.outputs.evaluation_config_output_file}}
        outputs:
            evaluation_result: ${{parent.outputs.metrics_result}}

AutoML 需要 MLTable 格式的训练数据。

使用以下命令启动管道运行。管道配置位于路径 ./automl-forecasting-pipeline.yml 中：

run_id=$(az ml job create --file automl-forecasting-pipeline.yml -w <Workspace> -g <Resource Group> --subscription <Subscription>)

提交运行请求后，管道按顺序运行 AutoML 训练、滚动评估推理和指标计算。可以在工作室 UI 中监视和检查运行。运行完成后，可将滚动预测和评估指标下载到本地工作目录：

Python SDK
Azure CLI

# Download metrics JSON.
ml_client.jobs.download(returned_pipeline_job.name, download_path=".", output_name='metrics_result')

# Download rolling forecasts.
ml_client.jobs.download(returned_pipeline_job.name, download_path=".", output_name='rolling_fcst_result')

az ml job download --name $run_id --download-path . --output-name metrics_result
az ml job download --name $run_id --download-path . --output-name rolling_fcst_result

可以在以下位置查看输出：

指标：./named-outputs/metrics_results/evaluationResult/metrics.json
预测： ./named-outputs/rolling_fcst_result/inference_output_file （JSON 行格式）

有关滚动评估的详细信息，请参阅预测模型的推理和评估。

大规模预测：多模型

AutoML 中的多模型组件使你能够并行训练和管理数百万个模型。有关详细信息，请参阅 “许多模型”。

多模型训练配置

多模型训练组件接受 AutoML 训练设置的 YAML 格式配置文件。组件将这些设置应用于它启动的每个 AutoML 实例。 YAML 文件具有与预测命令作业相同的规范，并且还包括 partition_column_names 和 allow_multi_partitions 参数。

参数	说明
`partition_column_names`	数据中的列名，分组后定义数据分区。多模型训练组件在每个分区上启动独立的训练作业。
`allow_multi_partitions`	一个可选标志，允许在每个分区包含多个唯一时序时为每个分区训练一个模型。默认值为 `false`。

下面是一个示例 YAML 配置：

$schema: https://azuremlsdk2.blob.core.chinacloudapi.cn/preview/0.0.1/autoMLJob.schema.json
type: automl

description: A time-series forecasting job config
compute: azureml:<cluster-name>
task: forecasting
primary_metric: normalized_root_mean_squared_error
target_column_name: sales
n_cross_validations: 3

forecasting:
  time_column_name: date
  time_series_id_column_names: ["state", "store"]
  forecast_horizon: 28

training:
  blocked_training_algorithms: ["ExtremeRandomTrees"]

limits:
  timeout_minutes: 15
  max_trials: 10
  max_concurrent_trials: 4
  max_cores_per_trial: -1
  trial_timeout_minutes: 15
  enable_early_termination: true
  
partition_column_names: ["state", "store"]
allow_multi_partitions: false

在后续示例中，配置存储在路径 ./automl_settings_mm.yml。

多模型管道

接下来，你将定义一个工厂函数，用于为许多模型训练、推理和指标计算的业务流程创建管道。下表描述了此工厂函数的参数：

参数	说明
`max_nodes`	要在训练作业中使用的计算节点数。
`max_concurrency_per_node`	要在每个节点上运行的 AutoML 进程数。多模型作业的总并发性为 `max_nodes` * `max_concurrency_per_node`。
`parallel_step_timeout_in_seconds`	多个模型组件的超时设置以秒为单位指定。
`retrain_failed_models`	用于为失败的模型启用重新训练的标志。如果以前运行的多模型导致某些数据分区上的 AutoML 作业失败，则此值非常有用。启用此标志时，许多模型仅针对以前失败的分区运行训练作业。
`forecast_mode`	模型评估的推理模式。有效值为 `recursive`（默认值）和 `rolling`。有关详细信息，请参阅预测模型的推理和评估和 ManyModelsInferenceParameters 类参考。
`step`	滚动预测的步骤大小。默认值为 1。有关详细信息，请参阅预测模型的推理和评估和 ManyModelsInferenceParameters 类参考。

以下示例演示了用于构造多模型训练和模型评估管道的工厂方法：

Python SDK
Azure CLI

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

# Get credential to access azureml registry.
try:
    credential = DefaultAzureCredential()
    # Check whether token can be obtained.
    credential.get_token("https://management.chinacloudapi.cn/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential if DefaultAzureCredential fails.
    credential = InteractiveBrowserCredential()

# Get many-models training component.
mm_train_component = ml_client_registry.components.get(
    name='automl_many_models_training',
    version='latest'
)

# Get many-models inference component.
mm_inference_component = ml_client_registry.components.get(
    name='automl_many_models_inference',
    version='latest'
)

# Get component to compute evaluation metrics.
compute_metrics_component = ml_client_metrics_registry.components.get(
    name="compute_metrics",
    label="latest"
)

@pipeline(description="AutoML Many Models Forecasting Pipeline")
def many_models_train_evaluate_factory(
    train_data_input,
    test_data_input,
    automl_config_input,
    compute_name,
    max_concurrency_per_node=4,
    parallel_step_timeout_in_seconds=3700,
    max_nodes=4,
    retrain_failed_model=False,
    forecast_mode="rolling",
    forecast_step=1
):
    mm_train_node = mm_train_component(
        raw_data=train_data_input,
        automl_config=automl_config_input,
        max_nodes=max_nodes,
        max_concurrency_per_node=max_concurrency_per_node,
        parallel_step_timeout_in_seconds=parallel_step_timeout_in_seconds,
        retrain_failed_model=retrain_failed_model,
        compute_name=compute_name
    )

    mm_inference_node = mm_inference_component(
        raw_data=test_data_input,
        max_nodes=max_nodes,
        max_concurrency_per_node=max_concurrency_per_node,
        parallel_step_timeout_in_seconds=parallel_step_timeout_in_seconds,
        optional_train_metadata=mm_train_node.outputs.run_output,
        forecast_mode=forecast_mode,
        step=forecast_step,
        compute_name=compute_name
    )

    compute_metrics_node = compute_metrics_component(
        task="tabular-forecasting",
        prediction=mm_inference_node.outputs.evaluation_data,
        ground_truth=mm_inference_node.outputs.evaluation_data,
        evaluation_config=mm_inference_node.outputs.evaluation_configs
    )

    # Return metrics results from rolling evaluation.
    return {
        "metrics_result": compute_metrics_node.outputs.evaluation_result
    }

通过工厂函数构造管道。训练和测试数据位于本地文件夹 ./data/train 和 ./data/test 中。最后，设置默认计算并提交作业，如以下示例所示：

pipeline_job = many_models_train_evaluate_factory(
    train_data_input=Input(
        type="uri_folder",
        path="./data/train"
    ),
    test_data_input=Input(
        type="uri_folder",
        path="./data/test"
    ),
    automl_config=Input(
        type="uri_file",
        path="./automl_settings_mm.yml"
    ),
    compute_name="<cluster name>"
)
pipeline_job.settings.default_compute = "<cluster name>"

returned_pipeline_job = ml_client.jobs.create_or_update(
    pipeline_job,
    experiment_name=experiment_name,
)
ml_client.jobs.stream(returned_pipeline_job.name)

$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline

description: AutoML Many Models Forecasting Pipeline
experiment_name: cli-v2-automl-mm-forecasting-pipeline

# Set default compute for pipeline steps.
settings:
    default_compute: azureml:cpu-compute

# Set pipeline inputs.
inputs:
    train_data_input:
        type: uri_folder
        path: "./train_data"
        mode: direct
    test_data_input:
        type: uri_folder
        path: "./test_data"
    automl_config_input:
        type: uri_file
        path: "./automl_settings_mm.yml"
    max_nodes: 4
    max_concurrency_per_node: 4
    parallel_step_timeout_in_seconds: 3700
    forecast_mode: rolling
    step: 1
    retrain_failed_model: False

# Set pipeline outputs.
# Output the evaluation metrics and raw test set rolling forecasts.
outputs: 
    metrics_result:
        type: uri_file
        mode: upload

jobs:
    # Configure AutoML many-models training component.
    mm_train_node:
        type: command
        component: azureml://registries/azureml-preview/components/automl_many_models_training
        inputs:
            raw_data: ${{parent.inputs.train_data_input}}
            automl_config: ${{parent.inputs.automl_config_input}}
            max_nodes: ${{parent.inputs.max_nodes}}
            max_concurrency_per_node: ${{parent.inputs.max_concurrency_per_node}}
            parallel_step_timeout_in_seconds: ${{parent.inputs.parallel_step_timeout_in_seconds}}
            retrain_failed_model: ${{parent.inputs.retrain_failed_model}}
        outputs:
            run_output:
                type: uri_folder

    # Configure inference node to make rolling forecasts on test set.
    mm_inference_node:
        type: command
        component: azureml://registries/azureml-preview/components/automl_many_models_inference
        inputs:
            raw_data: ${{parent.inputs.test_data_input}}
            max_concurrency_per_node: ${{parent.inputs.max_concurrency_per_node}}
            parallel_step_timeout_in_seconds: ${{parent.inputs.parallel_step_timeout_in_seconds}}
            forecast_mode: ${{parent.inputs.forecast_mode}}
            step: ${{parent.inputs.step}}
            max_nodes: ${{parent.inputs.max_nodes}}
            optional_train_metadata: ${{parent.jobs.mm_train_node.outputs.run_output}}
        outputs:
            run_output:
                type: uri_folder
            evaluation_configs:
                type: uri_file
            evaluation_data:
                type: uri_file

    # Configure metrics calculation node.
    compute_metrics:
        type: command
        component: azureml://registries/azureml/components/compute_metrics
        inputs:
            task: "tabular-forecasting"
            ground_truth: ${{parent.jobs.mm_inference_node.outputs.evaluation_data}}
            prediction: ${{parent.jobs.mm_inference_node.outputs.evaluation_data}}
            evaluation_config: ${{parent.jobs.mm_inference_node.outputs.evaluation_configs}}
        outputs:
            evaluation_result: ${{parent.outputs.metrics_result}}

使用以下命令启动管道作业。多模型管道配置文件位于路径 ./automl-mm-forecasting-pipeline.yml：

az ml job create --file automl-mm-forecasting-pipeline.yml -w <Workspace> -g <Resource Group> --subscription <Subscription>

作业完成后，可以使用单个训练运行管道中的过程在本地下载评估指标。

有关更详细的示例，请参阅 “使用多个模型”笔记本进行需求预测。

多模型运行的训练注意事项

多模型训练和推理组件根据 partition_column_names 设置，在条件下对数据进行分区。此过程会导致每个分区位于其自己的文件中。如果有很多数据，则该过程可能很慢或失败。建议在运行多模型训练或推理之前手动对数据进行分区。
在多模型训练期间，模型会自动在工作区中注册，因此不需要手动注册模型。模型是根据训练它们的分区命名的，这些名称不可自定义。标记也不可自定义。这些属性用于在推理过程中自动检测模型。
部署单个模型不可缩放，但可用于 PipelineComponentBatchDeployment 简化部署过程。有关示例，请参阅 “使用多个模型进行需求预测”笔记本。
在推理期间，将根据推理数据中发送的分区自动选择适当的模型（最新版本）。默认情况下，当你使用training_experiment_name时，将使用最新的模型，但你可以通过提供train_run_id来更改这种行为，从特定训练运行中选择模型。

注意

订阅中运行的多模型的默认并行度限制为 320。如果工作负载需要更高的限制，可以联系 Microsoft 支持部门。

大规模预测：分层时序

AutoML 中的分层时序（HTS）组件使你能够针对分层结构中的数据训练大量模型。有关详细信息，请参阅分层时序预测。

HTS 训练配置

HTS 训练组件接受 AutoML 训练设置的 YAML 格式配置文件。该组件将这些设置应用于它运行的每个 AutoML 实例。此 YAML 文件具有与预测命令作业相同的规范，但它包含与层次结构信息相关的其他参数：

参数	说明
`hierarchy_column_names`	数据中的列名列表，这些列名定义了数据的分层结构。此列表中的列的顺序决定了层次结构级别。聚合程度随着列表索引的增加而降低。也就是说，列表中的最后一列定义层次结构中的叶级（或聚合度最低的级别）。
`hierarchy_training_level`	用于预测模型训练的层次结构级别。

下面是一个示例 YAML 配置：

$schema: https://azuremlsdk2.blob.core.chinacloudapi.cn/preview/0.0.1/autoMLJob.schema.json
type: automl

description: A time-series forecasting job config
compute: azureml:cluster-name
task: forecasting
primary_metric: normalized_root_mean_squared_error
log_verbosity: info
target_column_name: sales
n_cross_validations: 3

forecasting:
  time_column_name: "date"
  time_series_id_column_names: ["state", "store", "SKU"]
  forecast_horizon: 28

training:
  blocked_training_algorithms: ["ExtremeRandomTrees"]

limits:
  timeout_minutes: 15
  max_trials: 10
  max_concurrent_trials: 4
  max_cores_per_trial: -1
  trial_timeout_minutes: 15
  enable_early_termination: true
  
hierarchy_column_names: ["state", "store", "SKU"]
hierarchy_training_level: "store"

在后续示例中，配置存储在路径 ./automl_settings_hts.yml。

HTS 管道

接下来，定义一个工厂函数，用于为 HTS 训练、推理和指标计算的业务流程创建管道。下表描述了此工厂函数的参数：

参数	说明
`forecast_level`	要检索其预测的层次结构的级别。
`allocation_method`	预测分解时使用的分配方法。有效值为 `proportions_of_historical_average` 和 `average_historical_proportions`。
`max_nodes`	训练作业中使用的计算节点数。
`max_concurrency_per_node`	要在每个节点上运行的 AutoML 进程数。 HTS 作业的总并发性为 `max_nodes` * `max_concurrency_per_node`。
`parallel_step_timeout_in_seconds`	多模型组件超时，以秒为单位指定。
`forecast_mode`	模型评估的推理模式。有效值为 `recursive` 和 `rolling`。有关详细信息，请参阅预测模型的推理和评估和 HTSInferenceParameters 类参考。
`step`	滚动预测的步骤大小。默认值为 1。有关详细信息，请参阅预测模型的推理和评估和 HTSInferenceParameters 类参考。

Python SDK
Azure CLI

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

# Get credential to access azureml registry.
try:
    credential = DefaultAzureCredential()
    # Check whether token can be obtained.
    credential.get_token("https://management.chinacloudapi.cn/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential if DefaultAzureCredential fails.
    credential = InteractiveBrowserCredential()

# Get HTS training component.
hts_train_component = ml_client_registry.components.get(
    name='automl_hts_training',
    version='latest'
)

# Get HTS inference component.
hts_inference_component = ml_client_registry.components.get(
    name='automl_hts_inference',
    version='latest'
)

# Get component to compute evaluation metrics.
compute_metrics_component = ml_client_metrics_registry.components.get(
    name="compute_metrics",
    label="latest"
)

@pipeline(description="AutoML HTS Forecasting Pipeline")
def hts_train_evaluate_factory(
    train_data_input,
    test_data_input,
    automl_config_input,
    max_concurrency_per_node=4,
    parallel_step_timeout_in_seconds=3700,
    max_nodes=4,
    forecast_mode="rolling",
    forecast_step=1,
    forecast_level="SKU",
    allocation_method='proportions_of_historical_average'
):
    hts_train = hts_train_component(
        raw_data=train_data_input,
        automl_config=automl_config_input,
        max_concurrency_per_node=max_concurrency_per_node,
        parallel_step_timeout_in_seconds=parallel_step_timeout_in_seconds,
        max_nodes=max_nodes
    )
    hts_inference = hts_inference_component(
        raw_data=test_data_input,
        max_nodes=max_nodes,
        max_concurrency_per_node=max_concurrency_per_node,
        parallel_step_timeout_in_seconds=parallel_step_timeout_in_seconds,
        optional_train_metadata=hts_train.outputs.run_output,
        forecast_level=forecast_level,
        allocation_method=allocation_method,
        forecast_mode=forecast_mode,
        step=forecast_step
    )
    compute_metrics_node = compute_metrics_component(
        task="tabular-forecasting",
        prediction=hts_inference.outputs.evaluation_data,
        ground_truth=hts_inference.outputs.evaluation_data,
        evaluation_config=hts_inference.outputs.evaluation_configs
    )

    # Return metrics results from rolling evaluation.
    return {
        "metrics_result": compute_metrics_node.outputs.evaluation_result
    }

使用工厂函数来构造管道。训练和测试数据位于本地文件夹 ./data/train 和 ./data/test 中。最后，设置默认计算并提交作业，如以下示例所示：

pipeline_job = hts_train_evaluate_factory(
    train_data_input=Input(
        type="uri_folder",
        path="./data/train"
    ),
    test_data_input=Input(
        type="uri_folder",
        path="./data/test"
    ),
    automl_config=Input(
        type="uri_file",
        path="./automl_settings_hts.yml"
    )
)
pipeline_job.settings.default_compute = "cluster-name"

returned_pipeline_job = ml_client.jobs.create_or_update(
    pipeline_job,
    experiment_name=experiment_name,
)
ml_client.jobs.stream(returned_pipeline_job.name)

$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline

description: AutoML Many Models Forecasting Pipeline
experiment_name: cli-v2-automl-mm-forecasting-pipeline

# Set the default compute for pipeline steps.
settings:
    default_compute: cpu-compute

# Set pipeline inputs.
inputs:
    train_data_input:
        type: uri_folder
        path: "./train_data"
        mode: direct
    test_data_input:
        type: uri_folder
        path: "./test_data"
    automl_config_input:
        type: uri_file
        path: "./automl_settings_hts.yml"
    max_concurrency_per_node: 4
    parallel_step_timeout_in_seconds: 3700
    max_nodes: 4
    forecast_mode: rolling
    step: 1
    allocation_method: proportions_of_historical_average
    forecast_level: # forecast level

# Set pipeline outputs.
# Output evaluation metrics and raw test set rolling forecasts.
outputs: 
    metrics_result:
        type: uri_file
        mode: upload

jobs:
    # Configure AutoML many-models training component.
    hts_train_node:
        type: command
        component: azureml://registries/azureml-preview/components/automl_hts_training
        inputs:
            raw_data: ${{parent.inputs.train_data_input}}
            automl_config: ${{parent.inputs.automl_config_input}}
            max_nodes: ${{parent.inputs.max_nodes}}
            max_concurrency_per_node: ${{parent.inputs.max_concurrency_per_node}}
            parallel_step_timeout_in_seconds: ${{parent.inputs.parallel_step_timeout_in_seconds}}
        outputs:
            run_output:
                type: uri_folder


    # Configure inference node to make rolling forecasts on test set.
    hts_inference_node:
        type: command
        component: azureml://registries/azureml-preview/components/automl_hts_inference
        inputs:
            raw_data: ${{parent.inputs.test_data_input}}
            max_concurrency_per_node: ${{parent.inputs.max_concurrency_per_node}}
            parallel_step_timeout_in_seconds: ${{parent.inputs.parallel_step_timeout_in_seconds}}
            forecast_mode: ${{parent.inputs.forecast_mode}}
            step: ${{parent.inputs.step}}
            max_nodes: ${{parent.inputs.max_nodes}}
            optional_train_metadata: ${{parent.jobs.hts_train_node.outputs.run_output}}
            forecast_level: ${{parent.inputs.forecast_level}}
            allocation_method: ${{parent.inputs.allocation_method}}
        outputs:
            run_output:
                type: uri_folder
            evaluation_configs:
                type: uri_file
            evaluation_data:
                type: uri_file

    # Configure metrics calculation node.
    compute_metrics:
        type: command
        component: azureml://registries/azureml/components/compute_metrics
        inputs:
            task: "tabular-forecasting"
            ground_truth: ${{parent.jobs.hts_inference_node.outputs.evaluation_data}}
            prediction: ${{parent.jobs.hts_inference_node.outputs.evaluation_data}}
            evaluation_config: ${{parent.jobs.hts_inference_node.outputs.evaluation_configs}}
        outputs:
            evaluation_result: ${{parent.outputs.metrics_result}}

使用以下命令启动管道作业。多模型管道配置位于路径 ./automl-hts-forecasting-pipeline.yml。

az ml job create --file automl-hts-forecasting-pipeline.yml -w <Workspace> -g <Resource Group> --subscription <Subscription>

作业完成后，可以使用单个训练运行管道中的过程在本地下载评估指标。

有关更详细的示例，请参阅使用 HTS 笔记本进行需求预测。

HTS 运行的训练注意事项

HTS 训练和推理组件根据 hierarchy_column_names 设置有条件地对数据进行分区，以便每个分区都位于其自己的文件中。如果有很多数据，此过程可能会很慢或失败。建议在运行 HTS 训练或推理之前手动对数据进行分区。

注意

在订阅中运行的 HTS 的默认并行度限制为 320。如果工作负载需要更高的限制，可以联系 Microsoft 支持部门。

大规模预测：分布式 DNN 训练

如本文前面所述，可以启用深度神经网络 (DNN) 学习。若要了解分布式训练如何用于 DNN 预测任务，请参阅分布式深度神经网络训练（预览版）。

对于需要大量数据的方案，使用 AutoML 进行分布式训练可用于一组有限的模型。可以在大规模 AutoML：分布式训练中找到更多信息和代码示例。

浏览示例笔记本

AutoML 预测示例笔记本 GitHub 存储库中提供了演示高级预测配置的详细代码示例。下面是一些示例笔记本：

Last updated on 2026-02-02

Compartir a través de