使用 Azure 机器学习 Visual Studio Code 扩展训练和部署图像分类 TensorFlow 模型Train and deploy an image classification TensorFlow model using the Azure Machine Learning Visual Studio Code Extension

了解如何使用 TensorFlow 和 Azure 机器学习 Visual Studio Code 扩展来训练和部署图像分类模型,以便识别手写数字。Learn how to train and deploy an image classification model to recognize hand-written numbers using TensorFlow and the Azure Machine Learning Visual Studio Code Extension.

本教程将介绍以下任务:In this tutorial, you learn the following tasks:

  • 了解代码Understand the code
  • 创建工作区Create a workspace
  • 创建试验Create an experiment
  • 配置计算机目标Configure Computer Targets
  • 运行配置文件Run a configuration file
  • 训练模型Train a model
  • 注册模型Register a model
  • 部署模型Deploy a model

先决条件Prerequisites

了解代码Understand the code

本教程的代码使用 TensorFlow 来训练可以对手写数字 0-9 进行分类的图像分类机器学习模型。The code for this tutorial uses TensorFlow to train an image classification machine learning model that categorizes handwritten digits from 0-9. 它通过创建一个神经网络来实现此目的。该神经网络将“28 像素 x 28 像素”图像的像素值作为输入,输出一个包含 10 个概率的列表,一个概率对应于要分类的一个数字。It does so by creating a neural network that takes the pixel values of 28 px x 28 px image as input and outputs a list of 10 probabilities, one for each of the digits being classified. 下面是数据的外观示例。Below is a sample of what the data looks like.

MNIST 数字

在计算机上的任何位置下载并解压缩 VS Code Tools for AI 存储库,获取本教程的代码。Get the code for this tutorial by downloading and unzipping the VS Code Tools for AI repository anywhere on your computer.

创建工作区Create a workspace

若要在 Azure 机器学习中生成应用程序,第一件必须做的事是创建工作区。The first thing you have to do to build an application in Azure Machine Learning is to create a workspace. 工作区包含用于训练模型的资源以及已训练的模型本身。A workspace contains the resources to train models as well as the trained models themselves. 有关详细信息,请参阅什么是工作区For more information, see what is a workspace.

  1. 在 Visual Studio Code 活动栏上选择 Azure 图标,打开“Azure 机器学习”视图。On the Visual Studio Code activity bar, select the Azure icon to open the Azure Machine Learning view.

  2. 右键单击你的 Azure 订阅,然后选择“创建工作区”。 Right-click your Azure subscription and select Create Workspace.

    创建工作区Create a workspace

  3. 默认情况下,会生成包含创建日期和时间的名称。By default a name is generated containing the date and time of creation. 在文本输入框中将名称更改为“TeamWorkspace”,然后按 EnterIn the text input box, change the name to "TeamWorkspace" and press Enter.

  4. 选择“新建资源组” 。Select Create a new resource group.

  5. 将资源组命名为“TeamWorkspace-rg”,然后按 EnterName your resource group "TeamWorkspace-rg" and press Enter.

  6. 为工作区选择一个位置。Choose a location for your workspace. 建议你在选择位置时,确保该位置最靠近你计划部署模型的位置。It's recommended to choose a location that is closest to the location you plan to deploy your model.

  7. 当系统提示选择工作区的类型时,请选择“基本”,创建一个基本工作区。 When prompted to select the type of workspace, select Basic to create a basic workspace. 有关不同工作区产品/服务的详细信息,请参阅 Azure 机器学习概述For more information on different workspace offerings, see Azure Machine Learning overview.

此时,系统会向 Azure 发出请求,以便在你的帐户中创建新的工作区。At this point, a request to Azure is made to create a new workspace in your account. 几分钟后,新工作区会显示在订阅节点中。After a few minutes, the new workspace appears in your subscription node.

创建试验Create an experiment

可以在工作区中创建一个或多个试验,以跟踪和分析各个模型训练运行。One or more experiments can be created in your workspace to track and analyze individual model training runs. 可在 Azure 云中或本地计算机上完成运行。Runs can be done in the Azure cloud or on your local machine.

  1. 在 Visual Studio Code 活动栏上选择 Azure 图标。On the Visual Studio Code activity bar, select the Azure icon. 此时会显示“Azure 机器学习”视图。The Azure Machine Learning view appears.

  2. 展开订阅节点。Expand your subscription node.

  3. 展开 TeamWorkspace 节点。Expand the TeamWorkspace node.

  4. 右键单击“试验” 节点。Right-click the Experiments node.

  5. 从上下文菜单中选择“创建试验” 。Select Create Experiment from the context menu.

    创建试验Create an experiment

  6. 将试验命名为“MNIST”,然后按 Enter 以创建新试验。Name your experiment "MNIST" and press Enter to create the new experiment.

与工作区一样,请求将发送到 Azure,以使用提供的配置创建试验。Like workspaces, a request is sent to Azure to create an experiment with the provided configurations. 几分钟后,新试验会出现在工作区的“试验” 节点中。After a few minutes, the new experiment appears in the Experiments node of your workspace.

配置计算目标Configure Compute Targets

计算目标是计算资源或环境,在其中运行脚本和部署已训练的模型。A compute target is the computing resource or environment where you run scripts and deploy trained models. 有关详细信息,请参阅 Azure 机器学习计算目标文档For more information, see the Azure Machine Learning compute targets documentation.

若要创建计算目标,请执行以下操作:To create a compute target:

  1. 在 Visual Studio Code 活动栏上选择 Azure 图标。On the Visual Studio Code activity bar, select the Azure icon. 此时会显示“Azure 机器学习”视图。The Azure Machine Learning view appears.

  2. 展开订阅节点。Expand your subscription node.

  3. 展开 TeamWorkspace 节点。Expand the TeamWorkspace node.

  4. 在工作区节点下,右键单击“计算” 节点,再选择“创建计算” 。Under the workspace node, right-click the Compute node and choose Create Compute.

    创建计算目标Create a compute target

  5. 选择“Azure 机器学习计算(AmlCompute)”。 Select Azure Machine Learning Compute (AmlCompute). Azure 机器学习计算是一个托管的计算基础结构,可让用户轻松创建能够与工作区中的其他用户一起使用的单节点或多节点计算。Azure Machine Learning Compute is a managed-compute infrastructure that allows the user to easily create a single or multi-node compute that can be used with other users in your workspace.

  6. 选择 VM 大小。Choose a VM size. 从选项列表中选择“Standard_F2s_v2” 。Select Standard_F2s_v2 from the list of options. VM 的大小会影响训练模型所需的时间。The size of your VM has an impact on the amount of time it takes to train your models. 有关 VM 大小的详细信息,请参阅 Azure 中的 Linux 虚拟机大小For more information on VM sizes, see sizes for Linux virtual machines in Azure.

  7. 将计算命名为“TeamWkspc-com”,然后按 Enter 来创建计算。Name your compute "TeamWkspc-com" and press Enter to create your compute.

    此时会在 VS Code 中显示一个文件,其中包含如下所示的内容:A file appears in VS Code with content similar to the one below:

    {
        "location": "chinaeast",
        "tags": {},
        "properties": {
            "computeType": "AmlCompute",
            "description": "",
            "properties": {
                "vmSize": "Standard_F2s_v2",
                "vmPriority": "dedicated",
                "scaleSettings": {
                    "maxNodeCount": 4,
                    "minNodeCount": 0,
                    "nodeIdleTimeBeforeScaleDown": 120
                },
                "userAccountCredentials": {
                    "adminUserName": "",
                    "adminUserPassword": "",
                    "adminUserSshPublicKey": ""
                },
                "subnetName": "",
                "vnetName": "",
                "vnetResourceGroupName": "",
                "remoteLoginPortPublicAccess": ""
            }
        }
    }
    
  8. 如果对配置满意,请选择“视图”>“命令面板”,将命令面板打开。 When satisfied with the configuration, open the command palette by selecting View > Command Palette.

  9. 在命令面板中输入以下命令,保存运行配置文件。Enter the following command into the command palette to save your run configuration file.

    Azure ML: Save and Continue
    

几分钟后,新计算目标会出现在工作区的“计算” 节点中。After a few minutes, the new compute target appears in the Compute node of your workspace.

创建运行配置Create a run configuration

将训练运行提交给计算目标时,还提交运行训练作业所需的配置。When you submit a training run to a compute target, you also submit the configuration needed to run the training job. 例如,包含训练代码的脚本以及运行它所需的 Python 依赖项。For example, the script that contains the training code and the Python dependencies needed to run it.

若要创建运行配置,请执行以下操作:To create a run configuration:

  1. 在 Visual Studio Code 活动栏上选择 Azure 图标。On the Visual Studio Code activity bar, select the Azure icon. 此时会显示“Azure 机器学习”视图。The Azure Machine Learning view appears.

  2. 展开订阅节点。Expand your subscription node.

  3. 展开“TeamWorkspace”>“计算”节点。 Expand the TeamWorkspace > Compute node.

  4. 在计算节点下,右键单击“TeamWkspc-com” 计算节点,然后选择“创建运行配置” 。Under the compute node, right-click the TeamWkspc-com compute node and choose Create Run Configuration.

    创建运行配置Create a run configuration

  5. 将运行配置命名为“MNIST-rc”,然后按 Enter 来创建运行配置。Name your run configuration "MNIST-rc" and press Enter to create your run configuration.

  6. 然后选择“创建新的 Azure ML 环境” 。Then, select Create new Azure ML Environment. 环境定义了运行脚本所需的依赖项。Environments define the dependencies required to run your scripts.

  7. 将环境命名为“MNIST-env”,然后按 Enter 。Name your environment "MNIST-env" and press Enter.

  8. 从列表中选择“Conda 依赖项文件”。 Select Conda dependencies file from the list.

  9. 按“Enter” 以浏览 Conda 依赖项文件。Press Enter to browse the Conda dependencies file. 在本例中,依赖项文件是 vscode-tools-for-ai/mnist-vscode-docs-sample 目录中的 env.yml 文件。In this case, the dependencies file is the env.yml file inside the vscode-tools-for-ai/mnist-vscode-docs-sample directory.

    此时会在 VS Code 中显示一个文件,其中包含如下所示的内容:A file appears in VS Code with content similar to the one below:

    {
        "name": "MNIST-env",
        "version": "1",
        "python": {
            "interpreterPath": "python",
            "userManagedDependencies": false,
            "condaDependencies": {
                "name": "vs-code-azure-ml-tutorial",
                "channels": [
                    "defaults"
                ],
                "dependencies": [
                    "python=3.6.2",
                    "tensorflow=1.15.0",
                    "pip",
                    {
                        "pip": [
                            "azureml-defaults"
                        ]
                    }
                ]
            },
            "baseCondaEnvironment": null
        },
        "environmentVariables": {},
        "docker": {
            "baseImage": "mcr.microsoft.com/azureml/base:intelmpi2018.3-ubuntu16.04",
            "baseDockerfile": null,
            "baseImageRegistry": {
                "address": null,
                "username": null,
                "password": null
            },
            "enabled": false,
            "arguments": []
        },
        "spark": {
            "repositories": [],
            "packages": [],
            "precachePackages": true
        },
        "inferencingStackVersion": null
    }
    
  10. 对配置满意以后,即可将其保存,方法是:打开命令面板并输入以下命令:Once you're satisfied with your configuration, save it by opening the command palette and entering the following command:

    Azure ML: Save and Continue
    
  11. Enter 浏览要在计算上运行的脚本文件。Press Enter to browse the script file to run on the compute. 在此示例中,用于训练模型的脚本是 vscode-tools-for-ai/mnist-vscode-docs-sample 目录内的 train.py 文件。In this case, the script to train the model is the train.py file inside the vscode-tools-for-ai/mnist-vscode-docs-sample directory.

    此时会在 VS Code 中显示名为 MNIST-rc.runconfig 的文件,其中包含如下所示的内容:A file called MNIST-rc.runconfig appears in VS Code with content similar to the one below:

    {
        "script": "train.py",
        "framework": "Python",
        "communicator": "None",
        "target": "TeamWkspc-com",
        "environment": {
            "name": "MNIST-env",
            "version": "1",
            "python": {
                "interpreterPath": "python",
                "userManagedDependencies": false,
                "condaDependencies": {
                    "name": "vs-code-azure-ml-tutorial",
                    "channels": [
                        "defaults"
                    ],
                    "dependencies": [
                        "python=3.6.2",
                        "tensorflow=1.15.0",
                        "pip",
                        {
                            "pip": [
                                "azureml-defaults"
                            ]
                        }
                    ]
                },
                "baseCondaEnvironment": null
            },
            "environmentVariables": {},
            "docker": {
                "baseImage": "mcr.microsoft.com/azureml/base:intelmpi2018.3-ubuntu16.04",
                "baseDockerfile": null,
                "baseImageRegistry": {
                    "address": null,
                    "username": null,
                    "password": null
                },
                "enabled": false,
                "arguments": []
            },
            "spark": {
                "repositories": [],
                "packages": [],
                "precachePackages": true
            },
            "inferencingStackVersion": null
        },
        "history": {
            "outputCollection": true,
            "snapshotProject": false,
            "directoriesToWatch": [
                "logs"
            ]
        }
    }
    
  12. 对配置满意以后,即可将其保存,方法是:打开命令面板并输入以下命令:Once you're satisfied with your configuration, save it by opening the command palette and entering the following command:

    Azure ML: Save and Continue
    

MNIST-rc 运行配置添加到“TeamWkspc-com” 计算节点下,MNIST-env 环境配置添加到“环境”节点下。 The MNIST-rc run configuration is added under the TeamWkspc-com compute node and the MNIST-env environment configuration is added under the Environments node.

训练模型Train the model

在训练过程中创建 TensorFlow 模型的方式是这样的:针对要分类的每个相应的数字处理在该模型中嵌入的训练数据和学习模式。During the training process, a TensorFlow model is created by processing the training data and learning patterns embedded within it for each of the respective digits being classified.

若要运行 Azure 机器学习试验,请执行以下操作:To run an Azure Machine Learning experiment:

  1. 在 Visual Studio Code 活动栏上选择 Azure 图标。On the Visual Studio Code activity bar, select the Azure icon. 此时会显示“Azure 机器学习”视图。The Azure Machine Learning view appears.

  2. 展开订阅节点。Expand your subscription node.

  3. 展开“TeamWorkspace”>“试验”节点。 Expand the TeamWorkspace > Experiments node.

  4. 右键单击“MNIST”试验。 Right-click the MNIST experiment.

  5. 选择“运行试验” 。Select Run Experiment.

    运行试验Run an experiment

  6. 从计算目标选项列表中,选择“TeamWkspc-com” 计算目标。From the list of compute target options, select the TeamWkspc-com compute target.

  7. 然后,选择“MNIST-rc” 运行配置。Then, select the MNIST-rc run configuration.

  8. 此时系统会向 Azure 发送请求,以便在工作区中所选的计算目标上运行试验。At this point, a request is sent to Azure to run your experiment on the selected compute target in your workspace. 此过程需要几分钟。This process takes several minutes. 运行训练作业的时间长度受多种因素(如计算类型和训练数据大小)的影响。The amount of time to run the training job is impacted by several factors like the compute type and training data size. 若要跟踪试验进度,请右键单击当前的运行节点,然后选择“在 Azure 门户中查看运行”。 To track the progress of your experiment, right-click the current run node and select View Run in Azure portal.

  9. 出现请求打开外部网站的对话框时,请选择“打开”。 When the dialog requesting to open an external website appears, select Open.

    跟踪试验进度Track experiment progress

训练完模型后,运行节点旁边的状态标签会更新为“已完成”。When the model is done training, the status label next to the run node updates to "Completed".

注册模型Register the model

训练完模型后,即可在工作区中注册它。Now that you've trained your model, you can register it in your workspace.

若要注册模型,请执行以下操作:To register your model:

  1. 在 Visual Studio Code 活动栏上选择 Azure 图标。On the Visual Studio Code activity bar, select the Azure icon. 此时会显示“Azure 机器学习”视图。The Azure Machine Learning view appears.

  2. 展开订阅节点。Expand your subscription node.

  3. 展开“TeamWorkspace”>“试验”>“MNIST”节点。 Expand the TeamWorkspace > Experiments > MNIST node.

  4. 获取训练模型时生成的模型输出。Get the model outputs generated from training the model. 右键单击“运行 1”运行节点,然后选择“下载输出”。 Right-click the Run 1 run node and select Download outputs.

    下载模型输出Download model outputs

  5. 选择要将下载的输出保存到其中的目录。Choose the directory to save the downloaded outputs to. 默认情况下,输出放置在 Visual Studio Code 当前打开的目录中。By default, the outputs are placed in the directory currently opened in Visual Studio Code.

  6. 右键单击“模型”节点 ,然后选择“注册模型” 。Right-click the Models node and choose Register Model.

    注册模型Register a model

  7. 将模型命名为“MNIST-TensorFlow-model”,然后按 EnterName your model "MNIST-TensorFlow-model" and press Enter.

  8. TensorFlow 模型由多个文件组成。A TensorFlow model is made up of several files. 从选项列表中选择“模型文件夹”作为模型路径格式。 Select Model folder as the model path format from the list of options.

  9. 选择 azureml_outputs/Run_1/outputs/outputs/model 目录。Select the azureml_outputs/Run_1/outputs/outputs/model directory.

    包含模型配置的文件会显示在 Visual Studio Code 中,其内容如下所示:A file containing your model configurations appears in Visual Studio Code with similar content to the one below:

    {
        "modelName": "MNIST-TensorFlow-model",
        "tags": {
            "": ""
        },
        "modelPath": "c:\\Dev\\vscode-tools-for-ai\\mnist-vscode-docs-sample\\azureml_outputs\\Run_1\\outputs\\outputs\\model",
        "description": ""
    }
    
  10. 对配置满意以后,即可将其保存,方法是:打开命令面板并输入以下命令:Once you're satisfied with your configuration, save it by opening the command palette and entering the following command:

    Azure ML: Save and Continue
    

几分钟后,模型会显示在“模型” 节点下。After a few minutes, the model appears under the Models node.

部署模型Deploy the model

在 Visual Studio Code 中,可以将模型作为 Web 服务部署到以下位置:In Visual Studio Code, you can deploy your model as a web service to:

  • Azure 容器实例 (ACI)。Azure Container Instances (ACI).
  • Azure Kubernetes 服务 (AKS)。Azure Kubernetes Service (AKS).

不需事先创建 ACI 容器来进行测试,因为可以根据需要创建 ACI 容器。You don't need to create an ACI container to test in advance, because ACI containers are created as needed. 但是,需提前配置 AKS 群集。However, you do need to configure AKS clusters in advance. 有关部署选项的详细信息,请参阅使用 Azure 机器学习来部署模型For more information on deployment options, see deploy models with Azure Machine Learning .

若要将 Web 服务部署为 ACI,请执行以下操作:To deploy a web service as an ACI :

  1. 在 Visual Studio Code 活动栏上选择 Azure 图标。On the Visual Studio Code activity bar, select the Azure icon. 此时会显示“Azure 机器学习”视图。The Azure Machine Learning view appears.

  2. 展开订阅节点。Expand your subscription node.

  3. 展开“TeamWorkspace”>“模型”节点。 Expand the TeamWorkspace > Models node.

  4. 右键单击“MNIST-TensorFlow-model”,选择“从已注册的模型部署服务”。 Right-click the MNIST-TensorFlow-model and select Deploy Service from Registered Model.

    部署模型Deploy the model

  5. 选择“Azure 容器实例”。 Select Azure Container Instances.

  6. 将服务命名为“mnist-tensorflow-svc”,然后按 EnterName your service "mnist-tensorflow-svc" and press Enter.

  7. 选择要在容器中运行的脚本,方法是:在输入框中按 Enter,通过浏览方式查找 mnist-vscode-docs-sample 目录中的 score.py 文件。Choose the script to run in the container by pressing Enter in the input box and browsing for the score.py file in the mnist-vscode-docs-sample directory.

  8. 提供运行脚本所需的依赖项,方法是:在输入框中按 Enter,通过浏览方式查找 mnist-vscode-docs-sample 目录中的 env.yml 文件。Provide the dependencies needed to run the script by pressing Enter in the input box and browsing for the env.yml file in the mnist-vscode-docs-sample directory.

    包含模型配置的文件会显示在 Visual Studio Code 中,其内容如下所示:A file containing your model configurations appears in Visual Studio Code with similar content to the one below:

    {
        "name": "mnist-tensorflow-svc",
        "imageConfig": {
            "runtime": "python",
            "executionScript": "score.py",
            "dockerFile": null,
            "condaFile": "env.yml",
            "dependencies": [],
            "schemaFile": null,
            "enableGpu": false,
            "description": ""
        },
        "deploymentConfig": {
            "cpu_cores": 1,
            "memory_gb": 10,
            "tags": {
                "": ""
            },
            "description": ""
        },
        "deploymentType": "ACI",
        "modelIds": [
            "MNIST-TensorFlow-model:1"
        ]
    }
    
  9. 对配置满意以后,即可将其保存,方法是:打开命令面板并输入以下命令:Once you're satisfied with your configuration, save it by opening the command palette and entering the following command:

    Azure ML: Save and Continue
    

此时系统会向 Azure 发送请求,以便部署 Web 服务。At this point, a request is sent to Azure to deploy your web service. 此过程需要几分钟。This process takes several minutes. 部署完成后,新服务会显示在“终结点” 节点下。Once deployed, the new service appears under the Endpoints node.

后续步骤Next steps