将专用 Python 包与 Azure 机器学习一起使用Use private Python packages with Azure Machine Learning

适用于:是基本版是企业版               (升级到企业版APPLIES TO: yesBasic edition yesEnterprise edition                    (Upgrade to Enterprise edition)

本文介绍如何在 Azure 机器学习内安全地使用专用 Python 包。In this article, learn how to use private Python packages securely within Azure Machine Learning. 专用 Python 包的用例包括:Use cases for private Python packages include:

  • 已开发不想公开分享的专用包。You've developed a private package that you don't want to share publicly.
  • 希望使用企业防火墙中存储的包的特选存储库。You want to use a curated repository of packages stored within an enterprise firewall.

建议的方法取决于单个 Azure 机器学习工作区有少量包,还是组织中所有工作区有整个包存储库。The recommended approach depends on whether you have few packages for a single Azure Machine Learning workspace, or an entire repository of packages for all workspaces within an organization.

通过 Environment 类使用专用包。The private packages are used through Environment class. 在环境内可声明要使用的 Python 包,包括专用包。Within an environment, you declare which Python packages to use, including private ones. 若要大体了解 Azure 机器学习中的环境,请参阅如何使用环境To learn about environment in Azure Machine Learning in general, see How to use environments.

先决条件Prerequisites

使用少量包进行开发和测试Use small number of packages for development and testing

对于单个工作区的少量专用包,请使用静态 Environment.add_private_pip_wheel() 方法。For a small number of private packages for a single workspace, use the static Environment.add_private_pip_wheel() method. 此方法可让你快速地将专用包添加到工作区,并且非常适用于开发和测试目的。This approach allows you to quickly add a private package to the workspace, and is well suited for development and testing purposes.

将文件路径参数指向本地 wheel 文件,然后运行 add_private_pip_wheel 命令。Point the file path argument to a local wheel file and run the add_private_pip_wheel command. 该命令返回用于跟踪工作区中包位置的 URL。The command returns a URL used to track the location of the package within your Workspace. 捕获存储 URL,并将其传递给 add_pip_package() 方法。Capture the storage URL and pass it the add_pip_package() method.

whl_url = Environment.add_private_pip_wheel(workspace=ws,file_path = "my-custom.whl")
myenv = Environment(name="myenv")
conda_dep = CondaDependencies()
conda_dep.add_pip_package(whl_url)
myenv.python.conda_dependencies=conda_dep

Azure 机器学习服务在内部将 URL 替换为安全的 SAS URL,使 wheel 文件保持专用和安全。Internally, Azure Machine Learning service replaces the URL by secure SAS URL, so your wheel file is kept private and secure.

从 Azure DevOps 源使用包的存储库Consume a repository of packages from Azure DevOps feed

如果正积极开发机器学习应用程序的 Python 包,可以将它们作为项目托管在 Azure DevOps 存储库,并将其作为源发布。If you're actively developing Python packages for your machine learning application, you can host them in an Azure DevOps repository as artifacts and publish them as a feed. 此方法可让你集成 DevOps 工作流,以便通过 Azure 机器学习工作区生成包。This approach allows you to integrate the DevOps workflow for building packages with your Azure Machine Learning Workspace. 若要了解如何使用 Azure DevOps 设置 Python 源,请阅读 Azure Artifacts 中的 Python 包入门To learn how to set up Python feeds using Azure DevOps, read Get Started with Python Packages in Azure Artifacts

此方法使用个人访问令牌对存储库进行身份验证。This approach uses Personal Access Token to authenticate against the repository. 同样的方法适用于采用基于令牌的身份验证的其他存储库,如专用 GitHub 存储库。The same approach is applicable to other repositories with token based authentication, such as private GitHub repositories.

  1. 为 Azure DevOps 实例创建个人访问令牌 (PAT)Create a Personal Access Token (PAT) for your Azure DevOps instance. 将令牌的范围设为 Packaging > Read。Set the scope of the token to Packaging > Read.

  2. 使用 Workspace.set_connection 方法添加 Azure DevOps URL 和 PAT 作为工作区属性。Add the Azure DevOps URL and PAT as workspace properties, using the Workspace.set_connection method.

    from azureml.core import Workspace
    
    pat_token = input("Enter secret token")
    ws = Workspace.from_config()
    ws.set_connection(name="connection-1", 
       category = "PythonFeed",
       target = "https://<my-org>.pkgs.visualstudio.com", 
       authType = "PAT", 
       value = pat_token) 
    
  3. 创建 Azure 机器学习环境,并从源添加 Python 包。Create an Azure Machine Learning environment and add Python packages from the feed.

    from azureml.core import Environment
    from azureml.core.conda_dependencies import CondaDependencies
    
    env = Environment(name="my-env")
    cd = CondaDependencies()
    cd.add_pip_package("<my-package>")
    cd.set_pip_option("--extra-index-url https://<my-org>.pkgs.visualstudio.com/<my-project>/_packaging/<my-feed>/pypi/simple")
    env.python.conda_dependencies=cd
    

环境现在已准备就绪,可用于训练运行或 Web 服务终结点部署。The environment is now ready to be used in training runs or web service endpoint deployments. 构建环境时,Azure 机器学习服务使用 PAT 通过匹配的基 URL 对源进行身份验证。When building the environment, Azure Machine Learning service uses the PAT to authenticate against the feed with the matching base URL.

从专用存储使用包的存储库Consume a repository of packages from private storage

可以在组织的防火墙内使用 Azure 存储帐户的包。You can consume packages from an Azure storage account within your organization's firewall. 此类存储帐户可以包含一组供企业使用的特选包,或者可公开使用的包的内部镜像。Such a storage account can hold a curated set of packages for enterprise use or an internal mirror of publicly available packages.

设置此类专用存储:To set up such private storage:

  1. 将工作区放入虚拟网络 (VNET)Place the Workspace inside a virtual network (VNET).
  2. 创建存储帐户,并禁用公共访问Create a storage account and disallow public access.
  3. 将要使用的 Python 包置于存储帐户中的容器Place the Python packages you want to use into a container within the storage account
  4. 允许从工作区 VNET 进行存储帐户访问Allow the storage account access from Workspace VNET

然后,可以通过 Azure Blob 存储中的完整 URL 引用 Azure 机器学习环境定义中的包。Then, you can reference the packages in the Azure Machine Learning environment definition by their full URL in Azure blob storage.

后续步骤Next steps