“执行 Python 脚本”模块Execute Python Script module

本文介绍 Azure 机器学习设计器中的“执行 Python 脚本”模块。This article describes the Execute Python Script module in Azure Machine Learning designer.

使用此模块可以运行 Python 代码。Use this module to run Python code. 有关 Python 体系结构和设计原理的详细信息,请参阅如何在 Azure 机器学习设计器中运行 Python 代码For more information about the architecture and design principles of Python, see how run Python code in Azure Machine Learning designer.

使用 Python,可以执行现有模块不支持的任务,例如:With Python, you can perform tasks that existing modules don't support, such as:

  • 使用 matplotlib 将数据可视化。Visualizing data by using matplotlib.
  • 使用 Python 库枚举工作区中的数据集和模型。Using Python libraries to enumerate datasets and models in your workspace.
  • 导入数据模块不支持的源中读取、加载和操作数据。Reading, loading, and manipulating data from sources that the Import Data module doesn't support.
  • 运行自己的深度学习代码。Run your own deep learning code.

支持的 Python 包Supported Python packages

Azure 机器学习使用 Python 的 Anaconda 分发版,其中包括用于数据处理的许多常用实用工具。Azure Machine Learning uses the Anaconda distribution of Python, which includes many common utilities for data processing. 我们将自动更新 Anaconda 版本。We will update the Anaconda version automatically. 当前版本为:The current version is:

  • Python 3.6 的 Anaconda 4.5 + 分发版Anaconda 4.5+ distribution for Python 3.6

有关完整列表,请参阅预安装的 Python 包部分。For a complete list, see the section Preinstalled Python packages.

若要安装不在预安装列表中的包(例如 scikit-misc),请将以下代码添加到脚本中:To install packages that aren't in the preinstalled list (for example, scikit-misc), add the following code to your script:

import os
os.system(f"pip install scikit-misc")

使用以下代码来安装包,以便提高性能(尤其是推理方面):Use the following code to install packages for better performance, especially for inference:

import importlib.util
package_name = 'scikit-misc'
spec = importlib.util.find_spec(package_name)
if spec is None:
    import os
    os.system(f"pip install scikit-misc")

备注

如果管道包含的多个“执行 Python 脚本”模块需要使用预安装列表中未包含的包,请在每个模块中安装这些包。If your pipeline contains multiple Execute Python Script modules that need packages that aren't in the preinstalled list, install the packages in each module.

警告

“执行 Python 脚本”模块不支持使用“apt-get”之类的命令安装依赖于其他本机库的包,例如 Java、PyODBC 等。这是因为,此模块是在仅预安装了 Python 并且具有非管理员权限的简单环境中执行的。Excute Python Script module does not support installing packages that depend on extra native libraries with command like "apt-get", such as Java, PyODBC and etc. This is because this module is executed in a simple environment with Python pre-installed only and with non-admin permission.

上传文件Upload files

“执行 Python 脚本”支持使用 Azure 机器学习 Python SDK 上传文件。The Execute Python Script module supports uploading files by using the Azure Machine Learning Python SDK.

以下示例演示如何在“执行 Python 脚本”模块中上传映像文件:The following example shows how to upload an image file in the Execute Python Script module:


# The script MUST contain a function named azureml_main,
# which is the entry point for this module.

# Imports up here can be used to
import pandas as pd

# The entry point function must have two input arguments:
#   Param<dataframe1>: a pandas.DataFrame
#   Param<dataframe2>: a pandas.DataFrame
def azureml_main(dataframe1 = None, dataframe2 = None):

    # Execution logic goes here
    print(f'Input pandas.DataFrame #1: {dataframe1}')

    from matplotlib import pyplot as plt
    plt.plot([1, 2, 3, 4])
    plt.ylabel('some numbers')
    img_file = "line.png"
    plt.savefig(img_file)

    from azureml.core import Run
    run = Run.get_context(allow_offline=True)
    run.upload_file(f"graphics/{img_file}", img_file)

    # Return value must be of a sequence of pandas.DataFrame
    # For example:
    #   -  Single return value: return dataframe1,
    #   -  Two return values: return dataframe1, dataframe2
    return dataframe1,
}

管道运行完成后,可以在模块的右侧面板中预览图像。After the pipeline run is finished, you can preview the image in the right panel of the module.

预览已上传的图像Preview of uploaded image

如何配置“执行 Python 脚本”How to configure Execute Python Script

“执行 Python 脚本”模块包含可用作起点的示例 Python 代码。The Execute Python Script module contains sample Python code that you can use as a starting point. 若要配置“执行 Python 脚本”模块,请在“Python 脚本”文本框中提供要运行的一组输入和 Python 代码。To configure the Execute Python Script module, provide a set of inputs and Python code to run in the Python script text box.

  1. 执行 Python 脚本 模块添加到管道。Add the Execute Python Script module to your pipeline.

  2. 从设计器中,在 Dataset1 上添加并连接要用于输入的任何数据集。Add and connect on Dataset1 any datasets from the designer that you want to use for input. 在 Python 脚本中将此数据集引用为 DataFrame1Reference this dataset in your Python script as DataFrame1.

    数据集的使用是可选的。Use of a dataset is optional. 如果要使用 Python 生成数据,或者使用 Python 代码将数据直接导入到模块中,则可以使用数据集。Use it if you want to generate data by using Python, or use Python code to import the data directly into the module.

    此模块支持在 Dataset2 上添加另一个数据集。This module supports the addition of a second dataset on Dataset2. 在 Python 脚本中将第二个数据集引用为 DataFrame2。Reference the second dataset in your Python script as DataFrame2.

    使用此模块加载时,存储在 Azure 机器学习中的数据集将自动转换为 pandas 数据帧。Datasets stored in Azure Machine Learning are automatically converted to pandas data frames when loaded with this module.

    执行 Python 输入映射

  3. 若要包括新的 Python 包或代码,请将包含这些自定义资源的压缩文件连接到脚本绑定端口。To include new Python packages or code, connect the zipped file that contains these custom resources to Script bundle port. 或者,如果脚本大于 16 KB,请使用脚本绑定端口以避免错误,如命令行超过 16597 个字符的限制。Or if your script is larger than 16 KB, use the Script Bundle port to avoid errors like CommandLine exceeds the limit of 16597 characters.

    1. 将脚本和其他自定义资源捆绑到一个 zip 文件中。Bundle the script and other custom resources to a zip file.
    2. 将 zip 文件作为“文件数据集”上传到工作室。Upload the zip file as a File Dataset to the studio.
    3. 从设计器创作页面左侧模块窗格的“数据集”列表中拖取数据集模块。Drag the dataset module from the Datasets list in the left module pane in the designer authoring page.
    4. 将数据集模块连接到“执行 R 脚本”模块的“脚本包”端口。Connect the dataset module to the Script Bundle port of Execute R Script module.

    在管道执行期间,可以使用已上传的压缩存档中包含的任何文件。Any file contained in the uploaded zipped archive can be used during pipeline execution. 如果存档中包含目录结构,则会保留结构。If the archive includes a directory structure, the structure is preserved.

    警告

    请勿使用 app 作为文件夹或脚本的名称,因为 app 是内置服务的保留字 。Don't use app as the name of folder or your script, since app is a reserved word for built-in services. 但可以使用其他命名空间,如 app123But you can use other namespaces like app123.

    下面是一个脚本绑定示例,其中包含一个 python 脚本文件和一个 txt 文件:Following is a script bundle example, which contains a python script file and a txt file:

    脚本绑定示例Script bundle example

    下面是 my_script.py 的内容:Following is the content of my_script.py:

    def my_func(dataframe1):
    return dataframe1
    

    以下是示例代码,显示了如何使用脚本绑定中的文件:Following is sample code showing how to consume the files in the script bundle:

    import pandas as pd
    from my_script import my_func
    
    def azureml_main(dataframe1 = None, dataframe2 = None):
    
        # Execution logic goes here
        print(f'Input pandas.DataFrame #1: {dataframe1}')
    
        # Test the custom defined python function
        dataframe1 = my_func(dataframe1)
    
        # Test to read custom uploaded files by relative path
        with open('./Script Bundle/my_sample.txt', 'r') as text_file:
            sample = text_file.read()
    
        return dataframe1, pd.DataFrame(columns=["Sample"], data=[[sample]])
    
  4. 在“Python 脚本”文本框中,键入或粘贴有效的 Python 脚本。In the Python script text box, type or paste valid Python script.

    备注

    编写脚本时请细致谨慎。Be careful when writing your script. 确保没有语法错误,例如使用了未声明的变量或未导入的模块或函数。Make sure there are no syntax errors, such as using undeclared variables or unimported modules or functions. 请特别注意预安装的模块列表。Pay extra attention to the preinstalled module list. 若要导入未列出的模块,请在脚本中安装相应的包,例如:To import modules that aren't listed, install the corresponding packages in your script, such as:

    import os
    os.system(f"pip install scikit-misc")
    

    “Python 脚本”文本框中预先填充了注释中的某些说明,以及用于数据访问和输出的示例代码。The Python script text box is prepopulated with some instructions in comments, and sample code for data access and output. 你必须编辑或替换此代码。You must edit or replace this code. 遵循有关缩进和大小写的 Python 约定:Follow Python conventions for indentation and casing:

    • 脚本必须包含名为 azureml_main 的函数作为此模块的入口点。The script must contain a function named azureml_main as the entry point for this module.
    • 入口点函数必须有两个输入参数 Param<dataframe1>Param<dataframe2>,即使脚本中没有使用这些参数。The entry point function must have two input arguments, Param<dataframe1> and Param<dataframe2>, even when these arguments aren't used in your script.
    • 连接到第三个输入端口的压缩文件将被解压缩并存储在目录 .\Script Bundle 中,该目录还会添加到 Python sys.path 中。Zipped files connected to the third input port are unzipped and stored in the directory .\Script Bundle, which is also added to the Python sys.path.

    如果 zip 文件包含 mymodule.py,请使用 import mymodule 导入它。If your .zip file contains mymodule.py, import it by using import mymodule.

    可以向设计器返回两个数据集,数据集必须是 pandas.DataFrame 类型的序列。Two datasets can be returned to the designer, which must be a sequence of type pandas.DataFrame. 可以在 Python 代码中创建其他输出,并将其直接写入到 Azure 存储。You can create other outputs in your Python code and write them directly to Azure storage.

    警告

    建议不要在“执行 Python 脚本”模块中连接到数据库或其他外部存储。 It's Not recommended to connect to a Database or other external storages in Execute Python Script Module. 可以使用“导入数据”模块“导出数据”模块You can use Import Data module and Export Data module

  5. 提交管道。Submit the pipeline.

    所有数据和代码都将加载到虚拟机中,并使用指定的 Python 环境运行。All of the data and code is loaded into a virtual machine, and run using the specified Python environment.

结果Results

嵌入的 Python 代码的任何计算结果都必须以 pandas.DataFrame 形式提供,该格式将自动转换为 Azure 机器学习数据集格式。The results of any computations by the embedded Python code must be provided as pandas.DataFrame, which is automatically converted to the Azure Machine Learning dataset format. 然后可以将结果与管道中的其他模块一起使用。You can then use the results with other modules in the pipeline.

此模块返回两个数据集:The module returns two datasets:

  • 结果数据集 1,由 Python 脚本中返回的第一个 pandas 数据帧定义。Results Dataset 1, defined by the first returned pandas data frame in a Python script.

  • 结果数据集 2,由 Python 脚本中返回的第二个 pandas 数据帧定义。Result Dataset 2, defined by the second returned pandas data frame in a Python script.

预安装的 Python 包Preinstalled Python packages

预安装的包如下:The preinstalled packages are:

  • adal==1.2.2adal==1.2.2
  • applicationinsights==0.11.9applicationinsights==0.11.9
  • attrs==19.3.0attrs==19.3.0
  • azure-common==1.1.25azure-common==1.1.25
  • azure-core==1.3.0azure-core==1.3.0
  • azure-graphrbac==0.61.1azure-graphrbac==0.61.1
  • azure-identity==1.3.0azure-identity==1.3.0
  • azure-mgmt-authorization==0.60.0azure-mgmt-authorization==0.60.0
  • azure-mgmt-containerregistry==2.8.0azure-mgmt-containerregistry==2.8.0
  • azure-mgmt-keyvault==2.2.0azure-mgmt-keyvault==2.2.0
  • azure-mgmt-resource==8.0.1azure-mgmt-resource==8.0.1
  • azure-mgmt-storage==8.0.0azure-mgmt-storage==8.0.0
  • azure-storage-blob==1.5.0azure-storage-blob==1.5.0
  • azure-storage-common==1.4.2azure-storage-common==1.4.2
  • azureml-core==1.1.5.5azureml-core==1.1.5.5
  • azureml-dataprep-native==14.1.0azureml-dataprep-native==14.1.0
  • azureml-dataprep==1.3.5azureml-dataprep==1.3.5
  • azureml-defaults==1.1.5.1azureml-defaults==1.1.5.1
  • azureml-designer-classic-modules==0.0.118azureml-designer-classic-modules==0.0.118
  • azureml-designer-core==0.0.31azureml-designer-core==0.0.31
  • azureml-designer-internal==0.0.18azureml-designer-internal==0.0.18
  • azureml-model-management-sdk==1.0.1b6.post1azureml-model-management-sdk==1.0.1b6.post1
  • azureml-pipeline-core==1.1.5azureml-pipeline-core==1.1.5
  • azureml-telemetry==1.1.5.3azureml-telemetry==1.1.5.3
  • backports.tempfile==1.0backports.tempfile==1.0
  • backports.weakref==1.0.post1backports.weakref==1.0.post1
  • boto3==1.12.29boto3==1.12.29
  • botocore==1.15.29botocore==1.15.29
  • cachetools==4.0.0cachetools==4.0.0
  • certifi==2019.11.28certifi==2019.11.28
  • cffi==1.12.3cffi==1.12.3
  • chardet==3.0.4chardet==3.0.4
  • click==7.1.1click==7.1.1
  • cloudpickle==1.3.0cloudpickle==1.3.0
  • configparser==3.7.4configparser==3.7.4
  • contextlib2==0.6.0.post1contextlib2==0.6.0.post1
  • cryptography==2.8cryptography==2.8
  • cycler==0.10.0cycler==0.10.0
  • dill==0.3.1.1dill==0.3.1.1
  • distro==1.4.0distro==1.4.0
  • docker==4.2.0docker==4.2.0
  • docutils==0.15.2docutils==0.15.2
  • dotnetcore2==2.1.13dotnetcore2==2.1.13
  • flask==1.0.3flask==1.0.3
  • fusepy==3.0.1fusepy==3.0.1
  • gensim==3.8.1gensim==3.8.1
  • google-api-core==1.16.0google-api-core==1.16.0
  • google-auth==1.12.0google-auth==1.12.0
  • google-cloud-core==1.3.0google-cloud-core==1.3.0
  • google-cloud-storage==1.26.0google-cloud-storage==1.26.0
  • google-resumable-media==0.5.0google-resumable-media==0.5.0
  • googleapis-common-protos==1.51.0googleapis-common-protos==1.51.0
  • gunicorn==19.9.0gunicorn==19.9.0
  • idna==2.9idna==2.9
  • imbalanced-learn==0.4.3imbalanced-learn==0.4.3
  • isodate==0.6.0isodate==0.6.0
  • itsdangerous==1.1.0itsdangerous==1.1.0
  • jeepney==0.4.3jeepney==0.4.3
  • jinja2==2.11.1jinja2==2.11.1
  • jmespath==0.9.5jmespath==0.9.5
  • joblib==0.14.0joblib==0.14.0
  • json-logging-py==0.2json-logging-py==0.2
  • jsonpickle==1.3jsonpickle==1.3
  • jsonschema==3.0.1jsonschema==3.0.1
  • kiwisolver==1.1.0kiwisolver==1.1.0
  • liac-arff==2.4.0liac-arff==2.4.0
  • lightgbm==2.2.3lightgbm==2.2.3
  • markupsafe==1.1.1markupsafe==1.1.1
  • matplotlib==3.1.3matplotlib==3.1.3
  • more-itertools==6.0.0more-itertools==6.0.0
  • msal-extensions==0.1.3msal-extensions==0.1.3
  • msal==1.1.0msal==1.1.0
  • msrest==0.6.11msrest==0.6.11
  • msrestazure==0.6.3msrestazure==0.6.3
  • ndg-httpsclient==0.5.1ndg-httpsclient==0.5.1
  • nimbusml==1.6.1nimbusml==1.6.1
  • numpy==1.18.2numpy==1.18.2
  • oauthlib==3.1.0oauthlib==3.1.0
  • pandas==0.25.3pandas==0.25.3
  • pathspec==0.7.0pathspec==0.7.0
  • pip==20.0.2pip==20.0.2
  • portalocker==1.6.0portalocker==1.6.0
  • protobuf==3.11.3protobuf==3.11.3
  • pyarrow==0.16.0pyarrow==0.16.0
  • pyasn1-modules==0.2.8pyasn1-modules==0.2.8
  • pyasn1==0.4.8pyasn1==0.4.8
  • pycparser==2.20pycparser==2.20
  • pycryptodomex==3.7.3pycryptodomex==3.7.3
  • pyjwt==1.7.1pyjwt==1.7.1
  • pyopenssl==19.1.0pyopenssl==19.1.0
  • pyparsing==2.4.6pyparsing==2.4.6
  • pyrsistent==0.16.0pyrsistent==0.16.0
  • python-dateutil==2.8.1python-dateutil==2.8.1
  • pytz==2019.3pytz==2019.3
  • requests-oauthlib==1.3.0requests-oauthlib==1.3.0
  • requests==2.23.0requests==2.23.0
  • rsa==4.0rsa==4.0
  • ruamel.yaml==0.15.89ruamel.yaml==0.15.89
  • s3transfer==0.3.3s3transfer==0.3.3
  • scikit-learn==0.22.2scikit-learn==0.22.2
  • scipy==1.4.1scipy==1.4.1
  • secretstorage==3.1.2secretstorage==3.1.2
  • setuptools==46.1.1.post20200323setuptools==46.1.1.post20200323
  • six==1.14.0six==1.14.0
  • smart-open==1.10.0smart-open==1.10.0
  • urllib3==1.25.8urllib3==1.25.8
  • websocket-client==0.57.0websocket-client==0.57.0
  • werkzeug==0.16.1werkzeug==0.16.1
  • wheel==0.34.2wheel==0.34.2

后续步骤Next steps

请参阅 Azure 机器学习的可用模块集See the set of modules available to Azure Machine Learning.