教程:Power BI 集成 - 拖放以创建预测模型(第 1 部分,共 2 部分)Tutorial: Power BI integration - Drag and drop to create the predictive model (part 1 of 2)

在本教程的第 1 部分中,你将使用 Azure 机器学习设计器训练和部署预测机器学习模型。In part 1 of this tutorial, you train and deploy a predictive machine learning model by using the Azure Machine Learning designer. 此设计器是一个低代码拖放式用户界面。The designer is a low-code drag-and-drop user interface. 在第 2 部分中,你将使用该模型来预测 Microsoft Power BI 中的结果。In part 2, you'll use the model to predict outcomes in Microsoft Power BI.

在本教程中,你将了解:In this tutorial, you:

  • 创建 Azure 机器学习计算实例。Create an Azure Machine Learning compute instance.
  • 创建 Azure 机器学习推理群集。Create an Azure Machine Learning inference cluster.
  • 创建数据集。Create a dataset.
  • 训练回归模型。Train a regression model.
  • 将模型部署到实时评分终结点。Deploy the model to a real-time scoring endpoint.

有三种方法可用于创建和部署要在 Power BI 中使用的模型。There are three ways to create and deploy the model you'll use in Power BI. 本文介绍“选项 B:使用设计器训练和部署模型”。This article covers "Option B: Train and deploy models by using the designer." 此选项是使用设计器界面的低代码创作体验。This option is a low-code authoring experience that uses the designer interface.

但你可改用其他选项之一:But you could instead use one of the other options:

先决条件Prerequisites

创建计算以便训练和评分Create compute to train and score

在本部分中,创建一个计算实例。In this section, you create a compute instance. 计算实例用于训练机器学习模型。Compute instances are used to train machine learning models. 还将创建一个推理群集,用于托管已部署的模型以进行实时评分。You also create an inference cluster to host the deployed model for real-time scoring.

登录到 Azure 机器学习工作室Sign in to Azure Machine Learning Studio. 在左侧菜单中,选择“计算”,然后选择“新建” :In the menu on the left, select Compute and then New:

显示如何创建计算实例的屏幕截图。

在“创建计算实例”页上,选择 VM 大小。On the Create compute instance page, select a VM size. 对于本教程,请选择“Standard_D11_v2”VM。For this tutorial, select a Standard_D11_v2 VM. 然后,选择“下一步”。Then select Next.

在“设置”页上,为计算实例命名。On the Settings page, name your compute instance. 然后选择“创建”。Then select Create.

提示

还可以使用计算实例来创建和运行笔记本。You can also use the compute instance to create and run notebooks.

计算实例“状态”现在为“正在创建” 。Your compute instance Status is now Creating. 预配计算机大约需要 4 分钟。The machine takes around 4 minutes to provision.

等待期间,在“计算”页上选择“推理群集”选项卡 。然后,选择“新建”:While you wait, on the Compute page, select the Inference clusters tab. Then select New:

显示如何创建推理群集的屏幕截图。

在“创建推理群集”页上,选择区域和 VM 大小。On the Create inference cluster page, select a region and a VM size. 对于本教程,请选择“Standard_D11_v2”VM。For this tutorial, select a Standard_D11_v2 VM. 然后,选择“下一步”。Then select Next.

在“配置设置”页上:On the Configure Settings page:

  1. 提供有效的计算名称。Provide a valid compute name.
  2. 选择“开发测试”作为群集目的。Select Dev-test as the cluster purpose. 此选项将创建单个节点来托管已部署的模型。This option creates a single node to host the deployed model.
  3. 选择“创建” 。Select Create.

推理群集“状态”现在为“正在创建” 。Your inference cluster Status is now Creating. 部署单个节点群集大约需要 4 分钟。Your single node cluster takes around 4 minutes to deploy.

创建数据集Create a dataset

在本教程中,将使用糖尿病数据集In this tutorial, you use the Diabetes dataset. Azure 开放数据集中提供了该数据集。This dataset is available in Azure Open Datasets.

若要创建数据集,请在左侧菜单中选择“数据集”。To create the dataset, in the menu on the left, select Datasets. 然后,选择“创建数据集”。Then select Create dataset. 你将看到以下选项:You see the following options:

显示如何新建数据集的屏幕截图。

选择“从开放数据集”。Select From Open Datasets. 在“从开放数据集创建数据集”页面:On the Create dataset from Open Datasets page:

  1. 使用搜索栏查找“糖尿病”。Use the search bar to find diabetes.
  2. 选择“示例:糖尿病”。Select Sample: Diabetes.
  3. 选择“下一页”。Select Next.
  4. 将数据集命名为“糖尿病”。Name your dataset diabetes.
  5. 选择“创建” 。Select Create.

若要浏览数据,请选择数据集,然后选择“浏览”:To explore the data, select the dataset and then select Explore:

显示如何浏览数据集的屏幕截图。

数据包含 10 个基线输入变量,例如年龄、性别、体重指数、平均血压和六项血清度量。The data has 10 baseline input variables, such as age, sex, body mass index, average blood pressure, and six blood serum measurements. 它还有一个名为“Y”的目标变量。该目标变量是基线后一年糖尿病进展的量化度量值。It also has one target variable, named Y. This target variable is a quantitative measure of diabetes progression one year after the baseline.

使用设计器创建机器学习模型Create a machine learning model by using the designer

创建计算和数据集之后,可以使用设计器来创建机器学习模型。After you create the compute and datasets, you can use the designer to create the machine learning model. 在 Azure 机器学习工作室中,选择“设计器”,然后选择“新建管道” :In Azure Machine Learning Studio, select Designer and then New pipeline:

显示如何创建新管道的屏幕截图。

将显示一个空白画布和“设置”菜单:You see a blank canvas and a Settings menu:

显示如何选择计算目标的屏幕截图。

在“设置”菜单中,选择“选择计算目标” 。On the Settings menu, choose Select compute target. 选择此前创建的计算实例,然后选择“保存”。Select the compute instance you created earlier, and then select Save. 将“草稿名称”更改为更容易记忆的名称,如 diabetes-model。Change the Draft name to something more memorable, such as diabetes-model. 最后,输入说明。Finally, enter a description.

在资产列表中,展开“数据集”并找到“糖尿病”数据集 。In list of assets, expand Datasets and locate the diabetes dataset. 将此组件拖动到画布上:Drag this component onto the canvas:

显示如何将组件拖动到画布上的屏幕截图。

接下来,将以下组件拖动到画布上:Next, drag the following components onto the canvas:

  1. 线性回归(位于“机器学习算法”中 )Linear Regression (located in Machine Learning Algorithms)
  2. 训练模型(位于“模型训练”中 )Train Model (located in Model Training)

在画布上,请注意组件顶部和底部的圆圈。On your canvas, notice the circles at the top and bottom of the components. 这些圆圈是端口。These circles are ports.

显示未连接组件上的端口的屏幕截图。

现在将组件连接在一起。Now wire the components together. 选择“糖尿病”数据集底部的端口。Select the port at the bottom of the diabetes dataset. 将其拖动到“训练模型”组件右上方的端口。Drag it to the port on the upper-right side of the Train Model component. 选择“线性回归”组件底部的端口。Select the port at the bottom of the Linear Regression component. 将其拖动到“训练模型”组件左上方的端口。Drag it to the port on the upper-left side of the Train Model component.

选择要用作要预测的标签(目标)变量的数据集列。Choose the dataset column to use as the label (target) variable to predict. 选择“训练模型”组件,然后选择“编辑列” 。Select the Train Model component and then select Edit column.

在对话框中,选择“输入列名称” > “Y” :In the dialog box, select Enter column name > Y:

显示如何选择标签列的屏幕截图。

选择“保存”。Select Save. 机器学习工作流应如下所示:Your machine learning workflow should look like this:

显示已连接组件的屏幕截图。

选择“提交”。 Select Submit. 在“试验”下,选择“新建” 。Under Experiment, select Create new. 为试验命名,然后选择“提交”。Name the experiment, and then select Submit.

备注

试验首次运行大约需要 5 分钟。Your experiment's first run should take around 5 minutes. 后续运行的速度要快得多,因为设计器缓存已运行的组件以减少延迟。Subsequent runs are much quicker because the designer caches components that have been run to reduce latency.

试验完成后,会看到此视图:When the experiment finishes, you see this view:

显示已完成运行的屏幕截图。

若要检查试验日志,请选择“训练模型”,然后选择“输出 + 日志” 。To inspect the experiment logs, select Train Model and then select Outputs + logs.

部署模型Deploy the model

若要部署模型,请在画布顶部,选择“创建推理管道” > “实时推理管道” :To deploy the model, at the top of the canvas, select Create inference pipeline > Real-time inference pipeline:

显示在何处选择实时推理管道的屏幕截图。

管道压缩为执行模型评分所需的组件。The pipeline condenses to just the components necessary to score the model. 为数据评分时,你不知道目标变量值。When you score the data, you won't know the target variable values. 因此,可以从数据集中删除“Y”。So you can remove Y from the dataset.

若要删除“Y”,请在画布上添加“选择数据集中的列”组件 。To remove Y, add a Select Columns in Dataset component to the canvas. 连接组件,将“糖尿病”数据集作为输入。Wire the component so the diabetes dataset is the input. 结果是输出到“计分模型”组件:The results are the output into the Score Model component:

显示如何删除列的屏幕截图。

在画布上选择“选择数据集中的列”组件,然后选择“编辑列” 。On the canvas, select the Select Columns in Dataset component, and then select Edit Columns.

在“选择列”对话框中,选择“按名称” 。In the Select columns dialog box, choose By name. 然后,确保选择了所有输入变量,但未选中目标:Then ensure that all the input variables are selected but the target is not selected:

显示如何删除列设置的屏幕截图。

选择“保存”。Select Save.

最后,选择“评分模型”组件,并确保清除“将评分列追加到输出”复选框 。Finally, select the Score Model component and ensure the Append score columns to output check box is cleared. 为了减少延迟,无需输入便会发送回预测。To reduce latency, the predictions are sent back without the inputs.

显示评分模型组件设置的屏幕截图。

在画布顶部选择“提交”。At the top of the canvas, select Submit.

成功运行推理管道后,可以将模型部署到推理群集。After you successfully run the inference pipeline, you can deploy the model to your inference cluster. 选择“部署”。Select Deploy.

在“设置实时终结点”对话框中,选择“部署新的实时终结点” 。In the Set-up real-time endpoint dialog box, select Deploy new real-time endpoint. 将终结点命名为 my-diabetes-model。Name the endpoint my-diabetes-model. 选择此前创建的推理,然后选择“部署”:Select the inference you created earlier, and then select Deploy:

显示实时终结点设置的屏幕截图。

后续步骤Next steps

在本教程中,你了解了如何训练和部署设计器模型。In this tutorial, you saw how to train and deploy a designer model. 在下一部分中,你将了解如何在 Power BI 中使用此模型(对其进行评分)。In the next part, you learn how to consume (score) this model in Power BI.