对 Python 中的自动化 ML 试验进行故障排除Troubleshoot automated ML experiments in Python

在本指南中,了解如何使用 Azure 机器学习 SDK 识别和解决自动化机器学习试验中的已知问题。In this guide, learn how to identify and resolve known issues in your automated machine learning experiments with the Azure Machine Learning SDK.

版本依赖项Version dependencies

AutoML 与更新的包版本的依赖项中断了兼容性。AutoML dependencies to newer package versions break compatibility. SDK 1.13.0 版本之后,由于以前的 AutoML 包中固定的旧版本与今天固定的较新版本之间不兼容,因此模型不会加载到旧的 SDK 中。After SDK version 1.13.0, models aren't loaded in older SDKs due to incompatibility between the older versions pinned in previous AutoML packages, and the newer versions pinned today.

可能出现以下错误:Expect errors such as:

  • 找不到模块错误,例如,Module not found errors such as,

    No module named 'sklearn.decomposition._truncated_svd

  • 导入错误,例如,Import errors such as,

    ImportError: cannot import name 'RollingOriginValidator',ImportError: cannot import name 'RollingOriginValidator',

  • 属性错误,例如,Attribute errors such as,

    AttributeError: 'SimpleImputer' object has no attribute 'add_indicator

解决方法取决于 AutoML SDK 训练版本:Resolutions depend on your AutoML SDK training version:

  • 如果 AutoML SDK 训练版本高于 1.13.0,则需要 pandas == 0.25.1scikit-learn==0.22.1If your AutoML SDK training version is greater than 1.13.0, you need pandas == 0.25.1 and scikit-learn==0.22.1.

    • 如果版本不匹配,请使用以下命令将 scikit-learn 和/或 pandas 升级到正确的版本,If there is a version mismatch, upgrade scikit-learn and/or pandas to correct version with the following,

          pip install --upgrade pandas==0.25.1
          pip install --upgrade scikit-learn==0.22.1
      
  • 如果 AutoML SDK 训练版本低于或等于 1.12.0,则需要 pandas == 0.23.4sckit-learn==0.20.3If your AutoML SDK training version is less than or equal to 1.12.0, you need pandas == 0.23.4 and sckit-learn==0.20.3.

    • 如果版本不匹配,请使用以下命令将 scikit-learn 和/或 pandas 降级到正确的版本,If there is a version mismatch, downgrade scikit-learn and/or pandas to correct version with the following,

        pip install --upgrade pandas==0.23.4
        pip install --upgrade scikit-learn==0.20.3
      

设置Setup

从 1.0.76 版本开始的 AutoML 包更改需要在更新到新版本之前卸载之前的版本。AutoML package changes since version 1.0.76 require the previous version to be uninstalled before updating to the new version.

  • ImportError: cannot import name AutoMLConfig

    如果从 v1.0.76 之前的 SDK 版本升级到 v1.0.76 或更高版本后遇到该错误,请先运行 pip uninstall azureml-train automl 再运行 pip install azureml-train-automl 来解决该错误。If you encounter this error after upgrading from an SDK version before v1.0.76 to v1.0.76 or later, resolve the error by running: pip uninstall azureml-train automl and then pip install azureml-train-automl. automl_setup.cmd 脚本会自动执行此操作。The automl_setup.cmd script does this automatically.

  • automl_setup 失败automl_setup fails

    • 在 Windows 上,从 Anaconda 提示符运行 automl_setup。On Windows, run automl_setup from an Anaconda Prompt. 安装 MinicondaInstall Miniconda.

    • 确保已安装 conda 64 位 4.4.10 版本或更高版本。Ensure that conda 64-bit version 4.4.10 or later is installed. 可以使用命令 conda info 检查该位。You can check the bit with the conda info command. 对于 Windows,platform 应为 win-64,对于 Mac,应为 osx-64The platform should be win-64 for Windows or osx-64 for Mac. 若要检查版本,请使用命令 conda -VTo check the version use the command conda -V. 如果安装了以前的版本,可以使用以下命令对其进行更新:conda update condaIf you have a previous version installed, you can update it by using the command: conda update conda. 若要检查 32 位,请运行To check 32-bit by running

    • 确保已安装 conda。Ensure that conda is installed.

    • Linux - gcc: error trying to exec 'cc1plus'Linux - gcc: error trying to exec 'cc1plus'

      1. 如果遇到 gcc: error trying to exec 'cc1plus': execvp: No such file or directory 错误,请为 Linux 分发版安装 GCC 构建工具。If the gcc: error trying to exec 'cc1plus': execvp: No such file or directory error is encountered, install the GCC build tools for your Linux distribution. 例如,在 Ubuntu 上,使用命令 sudo apt-get install build-essentialFor example, on Ubuntu, use the command sudo apt-get install build-essential.

      2. 将新名称作为第一个参数传递给 automl_setup 以创建新的 conda 环境。Pass a new name as the first parameter to automl_setup to create a new conda environment. 使用 conda env list 查看现有的 conda 环境,并使用 conda env remove -n <environmentname> 删除它们。View existing conda environments using conda env list and remove them with conda env remove -n <environmentname>.

  • automl_setup_linux.sh 失败:如果 automl_setup_linus.sh 在 Ubuntu Linux 上失败,并出现错误:unable to execute 'gcc': No such file or directoryautoml_setup_linux.sh fails: If automl_setup_linus.sh fails on Ubuntu Linux with the error: unable to execute 'gcc': No such file or directory

    1. 确保已启用出站端口 53 和 80。Make sure that outbound ports 53 and 80 are enabled. 在 Azure 虚拟机上,可选择 VM 并单击“网络”,从 Azure 门户执行此操作。On an Azure virtual machine, you can do this from the Azure portal by selecting the VM and clicking on Networking.
    2. 运行命令 sudo apt-get updateRun the command: sudo apt-get update
    3. 运行命令 sudo apt-get install build-essential --fix-missingRun the command: sudo apt-get install build-essential --fix-missing
    4. 再次运行 automl_setup_linux.shRun automl_setup_linux.sh again
  • configuration.ipynb 失败configuration.ipynb fails:

    • 对于本地 conda,请首先确保 automl_setup 已成功运行。For local conda, first ensure that automl_setup has successfully run.
    • 确保 subscription_id 是正确的。Ensure that the subscription_id is correct. 依次选择“所有服务”和“订阅”,在 Azure 门户中查找 subscription_id。Find the subscription_id in the Azure portal by selecting All Service and then Subscriptions. 字符“<”和“>”不应包含在 subscription_id 值中。The characters "<" and ">" should not be included in the subscription_id value. 例如,subscription_id = "12345678-90ab-1234-5678-1234567890abcd" 的格式有效。For example, subscription_id = "12345678-90ab-1234-5678-1234567890abcd" has the valid format.
    • 确保参与者或所有者有权访问订阅。Ensure Contributor or Owner access to the subscription.
    • 检查该区域是否为受支持的区域之一:eastus2eastuswestcentralussoutheastasiawesteuropeaustraliaeastwestus2southcentralusCheck that the region is one of the supported regions: eastus2, eastus, westcentralus, southeastasia, westeurope, australiaeast, westus2, southcentralus.
    • 确保使用 Azure 门户访问该区域。Ensure access to the region using the Azure portal.
  • workspace.from_config 失败workspace.from_config fails:

    如果调用 ws = Workspace.from_config() 失败:If the call ws = Workspace.from_config() fails:

    1. 确保 configuration.ipynb 笔记本已成功运行。Ensure that the configuration.ipynb notebook has run successfully.
    2. 如果正在从不在运行 configuration.ipynb 的文件夹下的文件夹中运行笔记本,则将文件夹 aml_config 及其包含的文件 config.json 复制到新文件夹中。If the notebook is being run from a folder that is not under the folder where the configuration.ipynb was run, copy the folder aml_config and the file config.json that it contains to the new folder. Workspace.from_config 读取笔记本文件夹或其父文件夹的 config.json。Workspace.from_config reads the config.json for the notebook folder or its parent folder.
    3. 如果正在使用新的订阅、资源组、工作区或区域,请确保再次运行 configuration.ipynb 笔记本。If a new subscription, resource group, workspace, or region, is being used, make sure that you run the configuration.ipynb notebook again. 仅当指定订阅下的指定资源组中已存在工作区时,直接更改 config.json 才会生效。Changing config.json directly will only work if the workspace already exists in the specified resource group under the specified subscription.
    4. 如果要更改区域,请更改工作区、资源组或订阅。If you want to change the region, change the workspace, resource group, or subscription. 即使指定的区域不同,Workspace.create 也不会创建或更新工作区(如果已存在)。Workspace.create will not create or update a workspace if it already exists, even if the region specified is different.

TensorFlowTensorFlow

从 SDK 1.5.0 版开始,自动化机器学习默认不安装 TensorFlow 模型。As of version 1.5.0 of the SDK, automated machine learning does not install TensorFlow models by default. 若要安装 TensorFlow 并将其用于自动化 ML 试验,请通过 CondaDependencies 安装 tensorflow==1.12.0To install TensorFlow and use it with your automated ML experiments, install tensorflow==1.12.0 via CondaDependencies.

  from azureml.core.runconfig import RunConfiguration
  from azureml.core.conda_dependencies import CondaDependencies
  run_config = RunConfiguration()
  run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['tensorflow==1.12.0'])

Numpy 失败Numpy failures

  • import numpy 在 Windows 中失败:在某些 Windows 环境中,最新的 Python 3.6.8 版本加载 numpy 时会出现错误。import numpy fails in Windows: Some Windows environments see an error loading numpy with the latest Python version 3.6.8. 如果出现此问题,请尝试使用 Python 3.6.7 版本。If you see this issue, try with Python version 3.6.7.

  • import numpy 失败:在自动化 ML conda 环境中检查 TensorFlow 版本。import numpy fails: Check the TensorFlow version in the automated ml conda environment. 支持的版本为 <1.13 的版本。Supported versions are < 1.13. 如果版本不低于 1.13,请从环境中卸载 TensorFlow。Uninstall TensorFlow from the environment if version is >= 1.13.

可以按下面的方式检查 TensorFlow 的版本并卸载:You can check the version of TensorFlow and uninstall as follows:

  1. 启动命令 shell,激活安装了自动化 ML 包的 conda 环境。Start a command shell, activate conda environment where automated ml packages are installed.
  2. 输入 pip freeze 并查找 tensorflow,如果找到,则列出的版本应 <1.13Enter pip freeze and look for tensorflow, if found, the version listed should be < 1.13
  3. 如果列出的版本不受支持,请在命令行界面中使用 pip uninstall tensorflow,并输入 y 进行确认。If the listed version is not a supported version, pip uninstall tensorflow in the command shell and enter y for confirmation.

jwt.exceptions.DecodeError

确切的错误消息:jwt.exceptions.DecodeError: It is required that you pass in a value for the "algorithms" argument when calling decode()Exact error message: jwt.exceptions.DecodeError: It is required that you pass in a value for the "algorithms" argument when calling decode().

对于不高于 1.17.0 的 SDK 版本,安装可能会导致 PyJWT 的版本不受支持。For SDK versions <= 1.17.0, installation might result in an unsupported version of PyJWT. 检查自动 ml conda 环境中的 PyJWT 版本是否是受支持的版本。Check that the PyJWT version in the automated ml conda environment is a supported version. 即 PyJWT 版本低于 2.0.0。That is PyJWT version < 2.0.0.

可以按下面的方式检查 PyJWT 的版本:You may check the version of PyJWT as follows:

  1. 启动命令 shell,并激活安装了自动化 ML 包的 conda 环境。Start a command shell and activate conda environment where automated ML packages are installed.

  2. 输入 pip freeze 并查找 PyJWT,如果找到,则列出的版本应低于 2.0.0Enter pip freeze and look for PyJWT, if found, the version listed should be < 2.0.0

如果列出的版本不是受支持的版本:If the listed version is not a supported version:

  1. 请考虑升级到 AutoML SDK 的最新版本:pip install -U azureml-sdk[automl]Consider upgrading to the latest version of AutoML SDK: pip install -U azureml-sdk[automl]

  2. 如果这不可行,请从环境中卸载 PyJWT,并安装正确的版本,如下所示:If that is not viable, uninstall PyJWT from the environment and install the right version as follows:

    1. 在命令行界面中输入 pip uninstall PyJWT,然后输入 y 进行确认。pip uninstall PyJWT in the command shell and enter y for confirmation.
    2. 使用 pip install 'PyJWT<2.0.0' 进行安装。Install using pip install 'PyJWT<2.0.0'.

DatabricksDatabricks

请参阅如何使用 Databricks 配置自动化 ML 试验See How to configure an automated ML experiment with Databricks.

预测 R2 评分始终为零Forecasting R2 score is always zero

如果提供的训练数据的时间序列包含的值与上一个 n_cv_splits + forecasting_horizon 数据点相同,则会出现此问题。This issue arises if the training data provided has time series that contains the same value for the last n_cv_splits + forecasting_horizon data points.

如果该模式在你的时序中是预期的,可将主要指标切换为“规范化均方根误差”。If this pattern is expected in your time series, you can switch your primary metric to normalized root mean squared error.

部署失败Failed deployment

对于版本早于 1.18.0 的 SDK,为部署创建的基本映像可能会失败,并出现以下错误:ImportError: cannot import name cached_property from werkzeugFor versions <= 1.18.0 of the SDK, the base image created for deployment may fail with the following error: ImportError: cannot import name cached_property from werkzeug.

以下步骤可解决此问题:The following steps can work around the issue:

  1. 下载模型包Download the model package
  2. 将包解压缩Unzip the package
  3. 使用解压缩资产进行部署Deploy using the unzipped assets

示例笔记本失败Sample notebook failures

如果示例笔记本失败,并出现属性、方法或库不存在的错误:If a sample notebook fails with an error that property, method, or library does not exist:

  • 确保在 Jupyter Notebook 中选择了正确的内核。Ensure that the correct kernel has been selected in the Jupyter Notebook. 内核显示在笔记本页面的右上方。The kernel is displayed in the top right of the notebook page. 默认值为 azure_automl。The default is azure_automl. 内核作为笔记本的一部分进行保存。The kernel is saved as part of the notebook. 如果切换到新的 conda 环境,则需要在笔记本中选择新内核。If you switch to a new conda environment, you need to select the new kernel in the notebook.

    • 对于 Azure Notebooks,它应为 Python 3.6。For Azure Notebooks, it should be Python 3.6.
    • 对于本地 conda 环境,它应为在 automl_setup 中指定的 conda 环境名称。For local conda environments, it should be the conda environment name that you specified in automl_setup.
  • 确保笔记本适用于正在使用的 SDK 版本,To ensure the notebook is for the SDK version that you are using,

    • 通过在 Jupyter Notebook 单元中执行 azureml.core.VERSION 来检查 SDK 版本。Check the SDK version by executing azureml.core.VERSION in a Jupyter Notebook cell.
    • 可以通过以下步骤从 GitHub 下载示例笔记本的早期版本:You can download previous version of the sample notebooks from GitHub with these steps:
      1. 选择 Branch 按钮Select the Branch button
      2. 导航到“扩展”选项卡Navigate to the Tags tab
      3. 选择版本Select the version

后续步骤Next steps