使用 Databricks 容器服务自定义容器 Customize containers with Databricks Container Services

Databricks 容器服务允许你在创建群集时指定 Docker 映像。Databricks Container Services lets you specify a Docker image when you create a cluster. 一些示例用例包括:Some example use cases include:

  • 库自定义 - 你可以完全控制你要安装的系统库。Library customization - you have full control over the system libraries you want installed.
  • 黄金容器环境 - 你的 Docker 映像是锁定的环境,永远不会更改。Golden container environment - your Docker image is a locked down environment that will never change.
  • Docker CI/CD 集成 - 可以将 Azure Databricks 与 Docker CI/CD 管道集成。Docker CI/CD integration - you can integrate Azure Databricks with your Docker CI/CD pipelines.

你还可以使用 Docker 映像在具有 GPU 设备的群集上创建自定义深度学习环境。You can also use Docker images to create custom deep learning environments on clusters with GPU devices. 若要进一步了解如何将 GPU 群集与 Databricks 容器服务配合使用,请参阅 GPU 群集上的 Databricks 容器服务For additional information about using GPU clusters with Databricks Container Services, refer to Databricks Container Services on GPU clusters.

对于容器每次启动时要执行的任务,请使用初始化脚本For tasks to be executed each time the container starts, use an init script.

要求Requirements

备注

用于机器学习的 Databricks Runtime 和用于基因组学的 Databricks Runtime 不支持 Databricks 容器服务。Databricks Runtime for Machine Learning and Databricks Runtime for Genomics do not support Databricks Container Services.

  • Databricks Runtime 6.1 或更高版本。Databricks Runtime 6.1 or above. 如果你以前使用过 Databricks 容器服务,则必须升级基础映像。If you have previously used Databricks Container Services you must upgrade your base images. 请参阅 https://github.com/databricks/containers 中用 6.x 标记的最新映像。Refer to the latest images in https://github.com/databricks/containers tagged with 6.x.
  • 你的 Azure Databricks 工作区必须已启用 Databricks 容器服务。Your Azure Databricks workspace must have Databricks Container Services enabled.
  • 你的计算机必须运行最新的 Docker 守护程序(一个经过测试的可与客户端/服务器版本 18.03.0-ce 一起使用的版本),并且 docker 命令必须在你的 PATH 上可用。Your machine must be running a recent Docker daemon (one that is tested and works with Client/Server Version 18.03.0-ce) and the docker command must be available on your PATH.

步骤 1:生成基础映像Step 1: Build your base

若要成功启动群集,必须满足 Azure Databricks 的几个最低要求。There are several minimal requirements for Azure Databricks to launch a cluster successfully. 因此,建议你根据 Azure Databricks 已生成并测试的基础映像生成 Docker 基础映像:Because of this, we recommend that you build your Docker base from a base that Azure Databricks has built and tested:

FROM databricksruntime/standard:latest
...

若要指定其他 Python 库(例如最新版本的 pandas 和 urllib),请使用特定于容器的 pip 版本。To specify additional Python libraries, such as the latest version of pandas and urllib, use the container-specific version of pip. 对于 datatabricksruntime/standard:latest 容器,请包括以下内容:For the datatabricksruntime/standard:latest container, include the following:

RUN /databricks/conda/envs/dcs-minimal/bin/pip install pandas
RUN /databricks/conda/envs/dcs-minimal/bin/pip install urllib3

示例基础映像承载在 https://hub.docker.com/u/databricksruntime 处的 Docker Hub 上。Example base images are hosted on Docker Hub at https://hub.docker.com/u/databricksruntime. 用来生成这些基础映像的 Dockerfile 位于 https://github.com/databricks/containersThe Dockerfiles used to generate these bases are at https://github.com/databricks/containers.

备注

基础映像 databricksruntime/standarddatabricksruntime/minimal 不会与不相关的 databricks-standarddatabricks-minimal 环境混淆,这些环境包含在不再可用的带有 Conda 的 Databricks Runtime(Beta 版本)中。The base images databricksruntime/standard and databricksruntime/minimal are not to be confused with the unrelated databricks-standard and databricks-minimal environments included in the no longer available Databricks Runtime with Conda (Beta).

你还可以从头开始生成 Docker 基础映像。You can also build your Docker base from scratch. 你的 Docker 映像必须满足以下要求:Your Docker image must meet these requirements:

或者,你可以使用 databricksruntime/minimal 处由 Databricks 生成的最小映像。Or, you can use the minimal image built by Databricks at databricksruntime/minimal.

当然,上面列出的最低要求不包括 Python、R、Ganglia 以及 Azure Databricks 群集中通常应该提供的许多其他功能。Of course, the minimal requirements listed above do not include Python, R, Ganglia, and many other features that you typically expect in Azure Databricks clusters. 若要获得这些功能,可以生成合适的基础映像(即 databricksruntime/rbase,适用于 R),或者参考 GitHub 中的 Dockerfile 来确定如何进行生成以支持你需要的特定功能。To get these features, build off the appropriate base image (that is, databricksruntime/rbase for R), or reference the Dockerfiles in GitHub to determine how to build in support for the specific features you want.

警告

现在,你可以控制群集的环境了。You now have control over the cluster’s environment. 强大的功能意味着重大的责任。With great power comes great responsibility. 灵活性太大容易造成破坏。With great flexibility comes ease of breakage. 本文档根据我们的经验提供了一些建议。This document provides several recommendations based on our experiences. 最终,你将开始走出已知的领域,事情可能会变坏!Eventually, you will start to step out of known territory, and things may break! 与任何 Docker 工作流一样,第一次或第二次可能无法正常运行,但一旦开始正常运行,就始终可以正常运行。As with any Docker workflow, things may not work the first time, or the second, but once they start to work they will always work.

步骤 2:推送基础映像Step 2: Push your base image

将自定义基础映像推送到 Docker 注册表。Push your custom base image to a Docker registry. 此过程已通过 Docker HubAzure 容器注册表 (ACR) 进行了测试。This process has been tested with Docker Hub and Azure Container Registry (ACR). 支持无身份验证或基本身份验证的 Docker 注册表应当能正常工作。Docker registries that support no auth or basic auth are expected to work.

步骤 3:启动群集Step 3: Launch your cluster

你可以使用 UI 或 API 启动群集。You can launch your cluster using the UI or the API.

使用 UI 启动群集Launch your cluster using the UI

  1. 指定支持 Databricks 容器服务的 Databricks Runtime 版本。Specify a Databricks Runtime Version that supports Databricks Container Services.

    选择 Databricks 运行时Select Databricks runtime

  2. 选择“使用自己的 Docker 容器”。Select Use your own Docker container .

  3. 在“Docker 映像 URL”字段中,输入你的自定义 Docker 映像。In the Docker Image URL field, enter your custom Docker image.

    Docker 映像 URL 示例:Docker image URL examples:

    • DockerHub:<organization>/<repository>:<tag>,例如:databricksruntime/standard:latestDockerHub: <organization>/<repository>:<tag>, for example: databricksruntime/standard:latest
    • Azure 容器注册表:<your-registry-name>.azurecr.io/<repository-name>:<tag>Azure Container Registry: <your-registry-name>.azurecr.io/<repository-name>:<tag>
  4. 选择身份验证类型。Select the authentication type.

使用 API 启动群集Launch your cluster using the API

  1. 生成 API 令牌Generate an API token.

  2. 使用群集 API 启动包含自定义 Docker 基础映像的群集。Use the Clusters API to launch a cluster with your custom Docker base.

    curl -X POST -H "Authorization: Bearer <token>" https://<databricks-instance>/api/2.0/clusters/create -d '{
      "cluster_name": "<cluster-name>",
      "num_workers": 0,
      "node_type_id": "Standard_DS3_v2",
      "docker_image": {
        "url": "databricksruntime/standard:latest",
        "basic_auth": {
          "username": "<docker-registry-username>",
          "password": "<docker-registry-password>"
        }
      },
      "spark_version": "5.5.x-scala2.11",
    }'
    

    basic_auth 要求取决于你的 Docker 映像类型:basic_auth requirements depend on your Docker image type:

    • 对于公共 Docker 映像,不要包括 basic_auth 字段。For public Docker images, do not include the basic_auth field.
    • 对于专用 Docker 映像,必须包括 basic_auth 字段,使用服务主体 ID 和密码作为用户名和密码。For private Docker images, you must include the basic_auth field, using a service principal ID and password as the username and password.
    • 对于 Azure ACR,必须包括 basic_auth 字段,使用服务主体 ID 和密码作为用户名和密码。For Azure ACR, you must include the basic_auth field, using a service principal ID and password as the username and password. 请参阅 Azure ACR 服务主体身份验证文档,了解如何创建服务主体。See Azure ACR service principal authentication documentation for information about creating the service principal.

使用初始化脚本 Use an init script

Databricks 容器服务群集允许客户在 Docker 容器中包括初始化脚本。Databricks Container Services clusters enable customers to include init scripts in the Docker container. 在大多数情况下,应避免使用初始化脚本,而应直接通过 Docker(使用 Dockerfile)进行自定义。In most cases, you should avoid init scripts and instead make customizations through Docker directly (using the Dockerfile). 但是,某些任务必须在容器启动时执行,而不是在构建容器时执行。However, certain tasks must be executed when the container starts, instead of when the container is built. 请为这些任务使用初始化脚本。Use an init script for these tasks.

例如,假设要在自定义容器中运行安全守护程序。For example, suppose you want to run a security daemon inside a custom container. 通过映像生成管道在 Docker 映像中安装并生成守护程序。Install and build the daemon in the Docker image through your image building pipeline. 然后,添加启动守护程序的初始化脚本。Then, add an init script that starts the daemon. 在此示例中,初始化脚本将包含一个类似于 systemctl start my-daemon 的行。In this example, the init script would include a line like systemctl start my-daemon.

在 API 中,你可以将初始化脚本指定为群集规范的一部分,如下所示。In the API, you can specify init scripts as part of the cluster spec as follows. 有关详细信息,请参阅 InitScriptInfoFor more information, see InitScriptInfo.

"init_scripts": [
    {
        "file": {
            "destination": "file:/my/local/file.sh"
        }
    }
]

对于 Databricks 容器服务映像,你还可以将初始化脚本存储在 DBFS 或云存储中。For Databricks Container Services images, you can also store init scripts in DBFS or cloud storage.

启动 Databricks 容器服务群集时,将执行以下步骤:The following steps take place when you launch a Databricks Container Services cluster:

  1. 从云提供商处获取 VM。VMs are acquired from the cloud provider.
  2. 从你的存储库中下载自定义 Docker 映像。The custom Docker image is downloaded from your repo.
  3. Azure Databricks 基于映像创建 Docker 容器。Azure Databricks creates a Docker container from the image.
  4. Databricks Runtime 代码复制到 Docker 容器中。Databricks Runtime code is copied into the Docker container.
  5. 执行初始化脚本。The init scrips are executed. 请参阅初始化脚本执行顺序See Init script execution order.

Azure Databricks 会忽略 Docker CMDENTRYPOINT 基元。Azure Databricks ignores the Docker CMD and ENTRYPOINT primitives.