使用 Visual Studio Code 进行交互式调试Interactive debugging with Visual Studio Code

适用于:是基本版是企业版               (升级到企业版APPLIES TO: yesBasic edition yesEnterprise edition                    (Upgrade to Enterprise edition)

了解如何使用 Visual Studio Code (VS Code) 和 depugpy 以交互方式调试 Azure 机器学习管道和部署。Learn how to interactively debug Azure Machine Learning pipelines and deployments using Visual Studio Code (VS Code) and depugpy.

对机器学习管道进行调试和故障排除Debug and troubleshoot machine learning pipelines

在某些情况下,可能需要以交互方式调试 ML 管道中使用的 Python 代码。In some cases, you may need to interactively debug the Python code used in your ML pipeline. 通过使用 VS Code 和 debugpy,可以在代码在训练环境中运行时附加到该代码。By using VS Code and debugpy, you can attach to the code as it runs in the training environment.

先决条件Prerequisites

  • 一个配置为使用 Azure 虚拟网络的 Azure 机器学习工作区。 An Azure Machine Learning workspace that is configured to use an Azure Virtual Network.

  • 一个使用 Python 脚本作为管道步骤的一部分的 Azure 机器学习管道。An Azure Machine Learning pipeline that uses Python scripts as part of the pipeline steps. 例如 PythonScriptStep。For example, a PythonScriptStep.

  • 一个位于虚拟网络中并供管道用来训练的 Azure 机器学习计算群集。 An Azure Machine Learning Compute cluster, which is in the virtual network and is used by the pipeline for training.

  • 一个位于虚拟网络中的开发环境。 A development environment that is in the virtual network. 该开发环境可以是下列其中一项:The development environment might be one of the following:

    • 虚拟网络中的 Azure 虚拟机An Azure Virtual Machine in the virtual network
    • 虚拟网络中笔记本 VM 的计算实例A Compute instance of Notebook VM in the virtual network
    • 通过 VPN 或 ExpressRoute 与虚拟网络建立了专用网络连接的客户端计算机。A client machine that has private network connectivity to the virtual network, either by VPN or via ExpressRoute.

有关将 Azure 虚拟网络与 Azure 机器学习配合使用的详细信息,请参阅在 Azure 虚拟网络中保护 Azure ML 试验和推理作业For more information on using an Azure Virtual Network with Azure Machine Learning, see Secure Azure ML experimentation and inference jobs within an Azure Virtual Network.

提示

虽然可以使用不在虚拟网络后面的 Azure 机器学习资源,但仍建议使用虚拟网络。Although you can work with Azure Machine Learning resources that are not behind a virtual network, using a virtual network is recommended.

工作原理How it works

ML 管道步骤运行 Python 脚本。Your ML pipeline steps run Python scripts. 可修改这些脚本来执行以下操作:These scripts are modified to perform the following actions:

  1. 记录运行这些脚本的主机的 IP 地址。Log the IP address of the host that they are running on. 使用 IP 地址将调试器连接到脚本。You use the IP address to connect the debugger to the script.

  2. 启动 debugpy 调试组件,并等待调试程序建立连接。Start the debugpy debug component, and wait for a debugger to connect.

  3. 在开发环境中,监视训练过程创建的日志,以查找运行脚本的 IP 地址。From your development environment, you monitor the logs created by the training process to find the IP address where the script is running.

  4. 使用 launch.json 文件告知 VS Code 要将调试器连接到哪个 IP 地址。You tell VS Code the IP address to connect the debugger to by using a launch.json file.

  5. 附加调试器并以交互方式逐步运行脚本。You attach the debugger and interactively step through the script.

配置 Python 脚本Configure Python scripts

若要启用调试,请对 ML 管道中的步骤使用的 Python 脚本进行以下更改:To enable debugging, make the following changes to the Python script(s) used by steps in your ML pipeline:

  1. 添加以下 import 语句:Add the following import statements:

    import argparse
    import os
    import debugpy
    import socket
    from azureml.core import Run
    
  2. 添加以下参数。Add the following arguments. 这些参数使你能够按需启用调试器,并设置附加调试器的超时:These arguments allow you to enable the debugger as needed, and set the timeout for attaching the debugger:

    parser.add_argument('--remote_debug', action='store_true')
    parser.add_argument('--remote_debug_connection_timeout', type=int,
                        default=300,
                        help=f'Defines how much time the AML compute target '
                        f'will await a connection from a debugger client (VSCODE).')
    parser.add_argument('--remote_debug_client_ip', type=str,
                        help=f'Defines IP Address of VS Code client')
    parser.add_argument('--remote_debug_port', type=int,
                        default=5678,
                        help=f'Defines Port of VS Code client')
    
  3. 添加以下语句。Add the following statements. 这些语句加载当前运行上下文,使你能够记录运行代码的节点的 IP 地址:These statements load the current run context so that you can log the IP address of the node that the code is running on:

    global run
    run = Run.get_context()
    
  4. 添加一个 if 语句,用于启动 debugpy 并等待调试程序附加完成。Add an if statement that starts debugpy and waits for a debugger to attach. 如果在超时之前未附加任何调试器,脚本将继续正常运行。If no debugger attaches before the timeout, the script continues as normal. 确保用自己的值替换 listen 函数的 HOSTPORT 值。Make sure to replace the HOST and PORT values is the listen function with your own.

    if args.remote_debug:
        print(f'Timeout for debug connection: {args.remote_debug_connection_timeout}')
        # Log the IP and port
        try:
            ip = args.remote_debug_client_ip
        except:
            print("Need to supply IP address for VS Code client")
        print(f'ip_address: {ip}')
        debugpy.listen(address=(ip, args.remote_debug_port))
        # Wait for the timeout for debugger to attach
        debugpy.wait_for_client()
        print(f'Debugger attached = {debugpy.is_client_connected()}')
    

以下 Python 示例演示了用于启用调试的基本 train.py 文件:The following Python example shows a basic train.py file that enables debugging:

# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license.

import argparse
import os
import debugpy
import socket
from azureml.core import Run

print("In train.py")
print("As a data scientist, this is where I use my training code.")

parser = argparse.ArgumentParser("train")

parser.add_argument("--input_data", type=str, help="input data")
parser.add_argument("--output_train", type=str, help="output_train directory")

# Argument check for remote debugging
parser.add_argument('--remote_debug', action='store_true')
parser.add_argument('--remote_debug_connection_timeout', type=int,
                    default=300,
                    help=f'Defines how much time the AML compute target '
                    f'will await a connection from a debugger client (VSCODE).')
parser.add_argument('--remote_debug_client_ip', type=str,
                    help=f'Defines IP Address of VS Code client')
parser.add_argument('--remote_debug_port', type=int,
                    default=5678,
                    help=f'Defines Port of VS Code client')

# Get run object, so we can find and log the IP of the host instance
global run
run = Run.get_context()

args = parser.parse_args()

# Start debugger if remote_debug is enabled
if args.remote_debug:
    print(f'Timeout for debug connection: {args.remote_debug_connection_timeout}')
    # Log the IP and port
    # ip = socket.gethostbyname(socket.gethostname())
    try:
        ip = args.remote_debug_client_ip
    except:
        print("Need to supply IP address for VS Code client")
    print(f'ip_address: {ip}')
    debugpy.listen(address=(ip, args.remote_debug_port))
    # Wait for the timeout for debugger to attach
    debugpy.wait_for_client()
    print(f'Debugger attached = {debugpy.is_client_connected()}')

print("Argument 1: %s" % args.input_data)
print("Argument 2: %s" % args.output_train)

if not (args.output_train is None):
    os.makedirs(args.output_train, exist_ok=True)
    print("%s created" % args.output_train)

配置 ML 管道Configure ML pipeline

若要提供所需的 Python 包来启动 debugpy 并获取运行上下文,请创建一个环境并设置 pip_packages=['debugpy', 'azureml-sdk==<SDK-VERSION>']To provide the Python packages needed to start debugpy and get the run context, create an environment and set pip_packages=['debugpy', 'azureml-sdk==<SDK-VERSION>']. 更改 SDK 版本,使之与当前使用的版本匹配。Change the SDK version to match the one you are using. 以下代码片段演示如何创建环境:The following code snippet demonstrates how to create an environment:

# Use a RunConfiguration to specify some additional requirements for this step.
from azureml.core.runconfig import RunConfiguration
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.runconfig import DEFAULT_CPU_IMAGE

# create a new runconfig object
run_config = RunConfiguration()

# enable Docker 
run_config.environment.docker.enabled = True

# set Docker base image to the default CPU-based image
run_config.environment.docker.base_image = DEFAULT_CPU_IMAGE

# use conda_dependencies.yml to create a conda environment in the Docker image for execution
run_config.environment.python.user_managed_dependencies = False

# specify CondaDependencies obj
run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'],
                                                                           pip_packages=['debugpy', 'azureml-sdk==<SDK-VERSION>'])

配置 Python 脚本部分,已将新参数添加到 ML 管道步骤使用的脚本。In the Configure Python scripts section, new arguments were added to the scripts used by your ML pipeline steps. 以下代码片段演示如何使用这些参数来为组件启用调试并设置超时。The following code snippet demonstrates how to use these arguments to enable debugging for the component and set a timeout. 此外,演示如何使用前面通过设置 runconfig=run_config 创建的环境:It also demonstrates how to use the environment created earlier by setting runconfig=run_config:

# Use RunConfig from a pipeline step
step1 = PythonScriptStep(name="train_step",
                         script_name="train.py",
                         arguments=['--remote_debug', '--remote_debug_connection_timeout', 300,'--remote_debug_client_ip','<VS-CODE-CLIENT-IP>','--remote_debug_port',5678],
                         compute_target=aml_compute,
                         source_directory=source_directory,
                         runconfig=run_config,
                         allow_reuse=False)

当管道运行时,每个步骤将创建一个子运行。When the pipeline runs, each step creates a child run. 如果启用了调试,则修改后的脚本将在子运行的 70_driver_log.txt 中记录类似于以下文本的信息:If debugging is enabled, the modified script logs information similar to the following text in the 70_driver_log.txt for the child run:

Timeout for debug connection: 300
ip_address: 10.3.0.5

保存 ip_address 值。Save the ip_address value. 下一部分会用到它。It is used in the next section.

提示

还可以在此管道步骤的子运行的运行日志中找到 IP 地址。You can also find the IP address from the run logs for the child run for this pipeline step. 有关如何查看此信息的详细信息,请参阅监视 Azure ML 试验运行和指标For more information on viewing this information, see Monitor Azure ML experiment runs and metrics.

配置开发环境Configure development environment

  1. 若要在 VS Code 部署环境中安装 debugpy,请使用以下命令:To install debugpy on your VS Code development environment, use the following command:

    python -m pip install --upgrade debugpy
    

    有关结合使用 VS Code 和 debugpy 的详细信息,请参阅远程调试For more information on using debugpy with VS Code, see Remote Debugging.

  2. 若要配置 VS Code 以便与运行调试器的 Azure 机器学习计算进行通信,请创建新的调试配置:To configure VS Code to communicate with the Azure Machine Learning compute that is running the debugger, create a new debug configuration:

    1. 在 VS Code 中,选择“调试”菜单,然后选择“打开配置” 。From VS Code, select the Debug menu and then select Open configurations. 打开一个名为 launch.json 的文件。A file named launch.json opens.

    2. 在 launch.json 文件中,找到包含 "configurations": [ 的行,然后在其后插入以下文本:In the launch.json file, find the line that contains "configurations": [, and insert the following text after it. "host": "<IP-ADDRESS>" 项更改为在上一部分所述的、在日志中返回的 IP 地址。Change the "host": "<IP-ADDRESS>" entry to the IP address returned in your logs from the previous section. "localRoot": "${workspaceFolder}/code/step" 项更改为包含所调试脚本的副本的本地目录:Change the "localRoot": "${workspaceFolder}/code/step" entry to a local directory that contains a copy of the script being debugged:

      {
          "name": "Azure Machine Learning Compute: remote debug",
          "type": "python",
          "request": "attach",
          "port": 5678,
          "host": "<IP-ADDRESS>",
          "redirectOutput": true,
          "pathMappings": [
              {
                  "localRoot": "${workspaceFolder}/code/step1",
                  "remoteRoot": "."
              }
          ]
      }
      

      重要

      如果“配置”部分已存在其他项,请在插入的代码后添加一个逗号 (,)。If there are already other entries in the configurations section, add a comma (,) after the code that you inserted.

      提示

      最佳做法(尤其是对于管道)是将脚本的资源保留在不同的目录中,以便代码仅与每个步骤相关。The best practice, especially for pipelines is to keep the resources for scripts in separate directories so that code is relevant only for each of the steps. 在此示例中,localRoot 示例值引用 /code/step1In this example the localRoot example value references /code/step1.

      如果你正在调试多个脚本,请在不同的目录中为每个脚本创建一个单独的配置节。If you are debugging multiple scripts, in different directories, create a separate configuration section for each script.

    3. 保存 launch.json 文件。Save the launch.json file.

连接调试器Connect the debugger

  1. 打开 VS Code,然后打开脚本的本地副本。Open VS Code and open a local copy of the script.

  2. 设置断点,在附加调试器后,脚本将在这些断点处停止。Set breakpoints where you want the script to stop once you've attached.

  3. 当子进程正在运行脚本,并且 Timeout for debug connection 已显示在日志中时,请按 F5 键或选择“调试”。While the child process is running the script, and the Timeout for debug connection is displayed in the logs, use the F5 key or select Debug. 出现提示时,选择“Azure 机器学习计算: 远程调试”配置。When prompted, select the Azure Machine Learning Compute: remote debug configuration. 还可以从侧栏中选择“调试”图标,从“调试”下拉菜单中选择“Azure 机器学习: 远程调试”项,然后使用绿色箭头附加调试器。You can also select the debug icon from the side bar, the Azure Machine Learning: remote debug entry from the Debug dropdown menu, and then use the green arrow to attach the debugger.

    此时,VS Code 将连接到计算节点上的 debugpy,并在前面设置的断点处停止。At this point, VS Code connects to debugpy on the compute node and stops at the breakpoint you set previously. 现在可以在代码运行时逐句调试代码、查看变量等。You can now step through the code as it runs, view variables, etc.

    备注

    如果日志中显示的某个项指出 Debugger attached = False,则表示超时期限已过,而脚本在没有调试器的情况下继续运行。If the log displays an entry stating Debugger attached = False, then the timeout has expired and the script continued without the debugger. 再次提交管道,并在显示 Timeout for debug connection 消息之后、超时期限已过之前连接调试器。Submit the pipeline again and connect the debugger after the Timeout for debug connection message, and before the timeout expires.

对部署进行调试和故障排除Debug and troubleshoot deployments

某些情况下,可能需要以交互方式调试包含在模型部署中的 Python 代码。In some cases, you may need to interactively debug the Python code contained in your model deployment. 例如,如果输入脚本失败,并且无法通过其他记录确定原因。For example, if the entry script is failing and the reason cannot be determined by additional logging. 通过使用 VS Code 和 debugpy,可以附加到在 Docker 容器中运行的代码。By using VS Code and the debugpy, you can attach to the code running inside the Docker container.

重要

使用 Model.deploy()LocalWebservice.deploy_configuration 在本地部署模型时,此调试方法不起作用。This method of debugging does not work when using Model.deploy() and LocalWebservice.deploy_configuration to deploy a model locally. 相反,你必须使用 Model.package() 方法创建一个映像。Instead, you must create an image using the Model.package() method.

若要在本地部署 Web 服务,需要在本地系统上安装能够正常工作的 Docker。Local web service deployments require a working Docker installation on your local system. 有关使用 Docker 的详细信息,请参阅 Docker 文档For more information on using Docker, see the Docker Documentation. 请注意,在使用计算实例时,已安装 Docker。Note that when working with compute instances, Docker is already installed.

配置开发环境Configure development environment

  1. 若要在本地 VS Code 部署环境中安装 debugpy,请使用以下命令:To install debugpy on your local VS Code development environment, use the following command:

    python -m pip install --upgrade debugpy
    

    有关结合使用 VS Code 和 debugpy 的详细信息,请参阅远程调试For more information on using debugpy with VS Code, see Remote Debugging.

  2. 若要配置 VS Code,使其与 Docker 映像进行通信,请创建新的调试配置:To configure VS Code to communicate with the Docker image, create a new debug configuration:

    1. 在 VS Code 中,选择“调试”菜单,然后选择“打开配置” 。From VS Code, select the Debug menu and then select Open configurations. 打开一个名为 launch.json 的文件。A file named launch.json opens.

    2. 在 launch.json 文件中,找到包含 "configurations": [ 的行,然后在其后插入以下文本:In the launch.json file, find the line that contains "configurations": [, and insert the following text after it:

      {
          "name": "Azure Machine Learning Deployment: Docker Debug",
          "type": "python",
          "request": "attach",
          "connect": {
              "port": 5678,
              "host": "0.0.0.0",
          },
          "pathMappings": [
              {
                  "localRoot": "${workspaceFolder}",
                  "remoteRoot": "/var/azureml-app"
              }
          ]
      }
      

      重要

      如果“配置”部分已存在其他项,请在插入的代码后添加一个逗号 (,)。If there are already other entries in the configurations section, add a comma (,) after the code that you inserted.

      本部分使用端口 5678 附加到 Docker 容器。This section attaches to the Docker container using port 5678.

    3. 保存 launch.json 文件。Save the launch.json file.

创建包括 debugpy 的映像Create an image that includes debugpy

  1. 修改部署的 Conda 环境,使其包括 debugpy。Modify the conda environment for your deployment so that it includes debugpy. 以下示例演示使用 pip_packages 参数添加它的过程:The following example demonstrates adding it using the pip_packages parameter:

    from azureml.core.conda_dependencies import CondaDependencies 
    
    
    # Usually a good idea to choose specific version numbers
    # so training is made on same packages as scoring
    myenv = CondaDependencies.create(conda_packages=['numpy==1.15.4',
                                'scikit-learn==0.19.1', 'pandas==0.23.4'],
                                 pip_packages = ['azureml-defaults==1.0.83', 'debugpy'])
    
    with open("myenv.yml","w") as f:
        f.write(myenv.serialize_to_string())
    
  2. 若要在服务启动时启动 debugpy 并等待连接,请将以下内容添加到 score.py 文件的顶部:To start debugpy and wait for a connection when the service starts, add the following to the top of your score.py file:

    import debugpy
    # Allows other computers to attach to debugpy on this IP address and port.
    debugpy.listen(('0.0.0.0', 5678))
    # Wait 30 seconds for a debugger to attach. If none attaches, the script continues as normal.
    debugpy.wait_for_client()
    print("Debugger attached...")
    
  3. 基于环境定义创建一个映像,并将该映像提取到本地注册表。Create an image based on the environment definition and pull the image to the local registry.

    备注

    此示例假定 ws 指向 Azure 机器学习工作区,且 model 是要部署的模型。This example assumes that ws points to your Azure Machine Learning workspace, and that model is the model being deployed. myenv.yml 文件包含步骤 1 中创建的 Conda 依赖项。The myenv.yml file contains the conda dependencies created in step 1.

    from azureml.core.conda_dependencies import CondaDependencies
    from azureml.core.model import InferenceConfig
    from azureml.core.environment import Environment
    
    
    myenv = Environment.from_conda_specification(name="env", file_path="myenv.yml")
    myenv.docker.base_image = None
    myenv.docker.base_dockerfile = "FROM mcr.microsoft.com/azureml/base:intelmpi2018.3-ubuntu16.04"
    inference_config = InferenceConfig(entry_script="score.py", environment=myenv)
    package = Model.package(ws, [model], inference_config)
    package.wait_for_creation(show_output=True)  # Or show_output=False to hide the Docker build logs.
    package.pull()
    

    创建并下载映像后,映像路径(包括存储库、名称和标记,在此示例中也是摘要)会显示在类似于以下内容的消息中:Once the image has been created and downloaded, the image path (includes repository, name, and tag, which in this case is also its digest) is displayed in a message similar to the following:

    Status: Downloaded newer image for myregistry.azurecr.io/package@sha256:<image-digest>
    
  4. 如果希望简化映像的操作,请使用以下命令添加标记。To make it easier to work with the image, use the following command to add a tag. myimagepath 替换为前面步骤中的位置值。Replace myimagepath with the location value from the previous step.

    docker tag myimagepath debug:1
    

    对于其余步骤,可以将本地映像作为 debug:1 而不是完整的映像路径值来进行引用。For the rest of the steps, you can refer to the local image as debug:1 instead of the full image path value.

调试服务Debug the service

提示

如果在 score.py 文件中为 debugpy 连接设置超时,则必须在超时到达之前将 VS Code 连接到调试会话。If you set a timeout for the debugpy connection in the score.py file, you must connect VS Code to the debug session before the timeout expires. 启动 VS Code,打开 score.py 的本地副本,设置一个断点,使其准备就绪,然后再使用本部分中的步骤进行操作。Start VS Code, open the local copy of score.py, set a breakpoint, and have it ready to go before using the steps in this section.

有关调试和设置断点的详细信息,请参阅调试For more information on debugging and setting breakpoints, see Debugging.

  1. 若要使用映像启动 Docker 容器,请使用以下命令:To start a Docker container using the image, use the following command:

    docker run -it --name debug -p 8000:5001 -p 5678:5678 -v <my_path_to_score.py>:/var/azureml-apps/score.py debug:1 /bin/bash
    

    这会将 score.py 本地附加到容器中的对应项。This attaches your score.py locally to the one in the container. 因此,在编辑器中所做的任何更改都将自动反映到容器中。Therefore, any changes made in the editor are automatically reflected in the container.

  2. 在容器内,在 shell 中运行以下命令Inside the container, run the following command in the shell

    runsvdir /var/runit
    
  3. 若要将 VS Code 附加到容器中的 debugpy,请打开 VS Code 并按 F5 或选择“调试”。To attach VS Code to debugpy inside the container, open VS Code and use the F5 key or select Debug. 出现提示时,请选择“Azure 机器学习部署: Docker 调试”配置。When prompted, select the Azure Machine Learning Deployment: Docker Debug configuration. 还可以从侧栏中选择“调试”图标,即“Azure 机器学习部署: Docker 调试”项(位于“调试”下拉菜单),然后使用绿色箭头附加调试器。You can also select the debug icon from the side bar, the Azure Machine Learning Deployment: Docker Debug entry from the Debug dropdown menu, and then use the green arrow to attach the debugger.

    “调试”图标、“启动调试”按钮和“配置”选择器

此时,VS Code 会连接到 Docker 容器内的 debugpy,并在之前设置的断点处停止。At this point, VS Code connects to debugpy inside the Docker container and stops at the breakpoint you set previously. 现在可以在代码运行时逐句调试代码、查看变量等。You can now step through the code as it runs, view variables, etc.

有关使用 VS Code 调试 Python 的详细信息,请参阅调试 Python 代码For more information on using VS Code to debug Python, see Debug your Python code.

停止容器Stop the container

若要停止容器,请使用以下命令:To stop the container, use the following command:

docker stop debug

后续步骤Next steps

现在,你已设置 Visual Studio Code Remote,可以将计算实例用作 Visual Studio Code 中的远程计算,从而对代码进行交互式调试。Now that you've set up Visual Studio Code Remote, you can use a compute instance as remote compute from Visual Studio Code to interactively debug your code.

教程:训练自己的首个 ML 模型演示如何将计算实例与集成的笔记本配合使用。Tutorial: Train your first ML model shows how to use a compute instance with an integrated notebook.