教程:Power BI 集成 - 使用自动化机器学习创建预测模型(第 1 部分,共 2 部分)Tutorial: Power BI integration - Create the predictive model by using automated machine learning (part 1 of 2)

在本教程的第 1 部分中,你将训练和部署预测机器学习模型。In part 1 of this tutorial, you train and deploy a predictive machine learning model. 你要在 Azure 机器学习工作室中使用自动化机器学习 (ML)。You use automated machine learning (ML) in Azure Machine Learning Studio. 在第 2 部分中,你将使用性能最佳的模型来预测 Microsoft Power BI 中的结果。In part 2, you'll use the best-performing model to predict outcomes in Microsoft Power BI.

在本教程中,你将了解:In this tutorial, you:

  • 创建 Azure 机器学习计算群集。Create an Azure Machine Learning compute cluster.
  • 创建数据集。Create a dataset.
  • 创建自动化机器学习运行。Create an automated machine learning run.
  • 将最佳模型部署到实时评分终结点。Deploy the best model to a real-time scoring endpoint.

有三种方法可用于创建和部署要在 Power BI 中使用的模型。There are three ways to create and deploy the model you'll use in Power BI. 本文介绍“选项 C:在工作室中使用自动化机器学习训练和部署模型”。This article covers "Option C: Train and deploy models by using automated machine learning in the studio." 此选项是一种不用代码的创作体验。This option is a no-code authoring experience. 它完全自动执行数据准备和模型训练。It fully automates data preparation and model training.

但你可改用其他选项之一:But you could instead use one of the other options:

先决条件Prerequisites

创建计算群集Create a compute cluster

自动化机器学习会训练大量机器学习模型,来找出“最佳”算法和参数。Automated machine learning trains many machine learning models to find the "best" algorithm and parameters. Azure 机器学习会在计算群集上并行运行模型训练。Azure Machine Learning parallelizes the running of the model training over a compute cluster.

若要开始,请在 Azure 机器学习工作室的左侧菜单中选择“计算”。To begin, in Azure Machine Learning Studio, in the menu on the left, select Compute. 打开“计算群集”选项卡。然后,选择“新建”:Open the Compute clusters tab. Then select New:

显示如何创建计算群集的屏幕截图。

在“创建计算群集”页面上:On the Create compute cluster page:

  1. 选择 VM 大小。Select a VM size. 在本教程中,可选择“Standard_D11_v2”计算机。For this tutorial, a Standard_D11_v2 machine is fine.
  2. 选择“下一页”。Select Next.
  3. 提供有效的计算名称。Provide a valid compute name.
  4. 将“最小节点数”保留为 0Keep Minimum number of nodes at 0.
  5. 将“最大节点数”更改为 4Change Maximum number of nodes to 4.
  6. 选择“创建” 。Select Create.

群集的状态更改为“正在创建”。The status of your cluster changes to Creating.

备注

新群集有 0 个节点,因此不会产生计算成本。The new cluster has 0 nodes, so no compute costs are incurred. 只有在自动化机器学习作业运行时才会产生成本。You incur costs only when the automated machine learning job runs. 空闲时间达到 120 秒后,群集自动缩减回到 0。The cluster scales back to 0 automatically after 120 seconds of idle time.

创建数据集Create a dataset

在本教程中,将使用糖尿病数据集In this tutorial, you use the Diabetes dataset. Azure 开放数据集中提供了该数据集。This dataset is available in Azure Open Datasets.

若要创建数据集,请在左侧菜单中选择“数据集”。To create the dataset, in the menu on the left, select Datasets. 然后,选择“创建数据集”。Then select Create dataset. 你将看到以下选项:You see the following options:

显示如何新建数据集的屏幕截图。

选择“从开放数据集”。Select From Open Datasets. 然后,在“从开放数据集创建数据集”页面:Then on the Create dataset from Open Datasets page:

  1. 使用搜索栏查找“糖尿病”。Use the search bar to find diabetes.
  2. 选择“示例:糖尿病”。Select Sample: Diabetes.
  3. 选择“下一页”。Select Next.
  4. 将数据集命名为“糖尿病”。Name your dataset diabetes.
  5. 选择“创建” 。Select Create.

若要浏览数据,请选择数据集,然后选择“浏览”:To explore the data, select the dataset and then select Explore:

显示如何浏览数据集的屏幕截图。

数据包含 10 个基线输入变量,例如年龄、性别、体重指数、平均血压和六项血清度量。The data has 10 baseline input variables, such as age, sex, body mass index, average blood pressure, and six blood serum measurements. 它还有一个名为“Y”的目标变量。该目标变量是基线后一年糖尿病进展的量化度量值。It also has one target variable, named Y. This target variable is a quantitative measure of diabetes progression one year after the baseline.

创建自动化机器学习运行Create an automated machine learning run

Azure 机器学习工作室的左侧菜单中,选择“自动化 ML”。In Azure Machine Learning Studio, in the menu on the left, select Automated ML. 然后,选择“新建自动化 ML 运行”:Then select New Automated ML run:

显示如何新建自动化机器学习运行的屏幕截图。

接下来,选择之前创建的“糖尿病”数据集。Next, select the diabetes dataset you created earlier. 然后,选择“下一步”:Then select Next:

显示如何选择数据集的屏幕截图。

在“配置运行”页面上:On the Configure run page:

  1. 在“试验名称”下,选择“新建” 。Under Experiment name, select Create new.
  2. 为试验命名。Name the experiment.
  3. 在“目标列”字段中,选择“Y” 。In the Target column field, select Y.
  4. 在“选择计算群集”字段中,选择之前创建的计算群集。In the Select compute cluster field, select the compute cluster you created earlier.

已完成的窗体应如下所示:Your completed form should look like this:

显示如何配置自动化机器学习的屏幕截图。

最后,选择机器学习任务。Finally, select a machine learning task. 在本例中,任务是“回归”:In this case, the task is Regression:

显示如何配置任务的屏幕截图。

选择“完成”。Select Finish.

重要

自动化机器学习大约需要 30 分钟才能完成 100 个模型的训练。Automated machine learning takes around 30 minutes to finish training the 100 models.

部署最佳模型Deploy the best model

自动化机器学习完成后,可选择“模型”选项卡来查看已尝试的所有机器学习模型。模型按性能排序,性能最佳的模型显示在最前面。When automated machine learning finishes, you can see all the machine learning models that have been tried by selecting the Models tab. The models are ordered by performance; the best-performing model is shown first. 选择最佳模型后,将启用“部署”按钮:After you select the best model, the Deploy button is enabled:

显示模型列表的屏幕截图。

选择“部署”来打开“部署模型”窗口 :Select Deploy to open a Deploy a model window:

  1. 将模型服务命名为 diabetes-model。Name your model service diabetes-model.
  2. 选择“Azure 容器服务”。Select Azure Container Service.
  3. 选择“部署”。Select Deploy.

你应会看到一条消息,它指示已成功部署模型。You should see a message that states that the model was deployed successfully.

后续步骤Next steps

在本教程中,你了解了如何使用自动化机器学习训练和部署机器学习模型。In this tutorial, you saw how to train and deploy a machine learning model by using automated machine learning. 在下一教程中,你将了解如何在 Power BI 中使用此模型(对其进行评分)。In the next tutorial, you'll learn how to consume (score) this model in Power BI.