单节点群集Single Node clusters

重要

此功能目前以公共预览版提供。This feature is in Public Preview.

单节点群集是包含 Spark 驱动程序但不包含 Spark 工作器的群集。A Single Node cluster is a cluster consisting of a Spark driver and no Spark workers. 此类群集支持 Spark 作业和所有 Spark 数据源,包括 Delta LakeSuch clusters support Spark jobs and all Spark data sources, including Delta Lake. 与之相反,标准群集至少需要一个 Spark 工作器才能运行 Spark 作业。In contrast, Standard clusters require at least one Spark worker to run Spark jobs.

单节点群集在以下情况下很有用:Single Node clusters are helpful in the following situations:

  • 运行需要 Spark 来加载和保存数据的单节点机器学习工作负荷Running single node machine learning workloads that need Spark to load and save data
  • 轻型探索性数据分析 (EDA)Lightweight exploratory data analysis (EDA)

创建单节点群集Create a Single Node cluster

若要创建单节点群集,请在配置群集时在“群集模式”下拉列表中选择“单节点”。To create a Single Node cluster, select Single Node in the Cluster Mode drop-down list when configuring a cluster.

单节点群集模式Single Node cluster mode

单节点群集属性Single Node cluster properties

单节点群集具有以下属性:A Single Node cluster has the following properties:

  • 在本地运行 Spark,并使用与群集上的逻辑核心相同数量(驱动程序上的内核数 - 1)的执行程序线程。Runs Spark locally with as many executor threads as logical cores on the cluster (the number of cores on driver - 1).
  • 有 0 个工作器,其中的驱动程序节点同时充当主节点和工作器节点。Has 0 workers, with the driver node acting as both master and worker.
  • 执行程序 stderr 日志、stdout 日志和 log4j 日志位于驱动程序日志中。The executor stderr, stdout, and log4j logs are in the driver log.
  • 不能转换为标准群集,Cannot be converted to a Standard cluster. 而只能创建一个模式设置为“标准”的新群集。Instead, create a new cluster with the mode set to Standard.

限制 Limitations

  • 不建议使用单节点群集进行大规模数据处理。Single Node clusters are not recommended for large scale data processing. 如果超出了单节点群集上的资源,建议使用标准模式群集。If you exceed the resources on a Single Node cluster, we recommend using a Standard mode cluster.

  • 建议不要共享单节点群集。We do not recommend sharing Single Node clusters. 由于所有工作负荷都将在同一节点上运行,因此用户更有可能遇到资源冲突。Since all workloads would run on the same node, users would be more likely to run into resource conflicts. Databricks 建议为共享群集使用标准模式。Databricks recommends Standard mode for shared clusters.

  • 不能通过将最小工作器数量设置为 0 将标准群集转换为单节点群集,You cannot convert a Standard cluster to a Single Node cluster by setting the minimum number of workers to 0. 而只能创建一个模式设置为“单节点”的新群集。Instead, create a new cluster with the mode set to Single Node.

  • 单节点群集不兼容进程隔离。Single Node clusters are not compatible with process isolation.

  • 单节点群集不支持 Databricks 容器服务Single Node clusters do not support Databricks Container Services.

  • 不会在单节点群集上启用 GPU 计划。GPU scheduling is not enabled on Single Node clusters.

  • 在单节点群集上,Spark 无法读取具有 UDT 列的 Parquet 文件,并可能返回以下错误消息:On Single Node clusters, Spark cannot read Parquet files with a UDT column and may return the following error message:

    The Spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached.
    

    若要解决此问题,请使用以下语句将 Spark 配置 spark.databricks.io.parquet.nativeReader.enabled 设置为 falseTo work around this problem, set the Spark configuration spark.databricks.io.parquet.nativeReader.enabled to false with

    spark.conf.set("spark.databricks.io.parquet.nativeReader.enabled", False)
    

单节点群集策略 Single Node cluster policy

群集策略简化了单节点群集的群集配置。Cluster policies simplify cluster configuration for Single Node clusters.

例如,为没有群集创建权限的数据科学团队管理群集时,管理员可能需要授权团队创建总共多达 10 个单节点交互式群集。As an illustrative example, when managing clusters for a data science team that does not have cluster creation permissions, an admin may want to authorize the team to create up to 10 Single Node interactive clusters in total. 这可以使用实例池群集策略和单节点群集模式来完成:This can be done using instance pools, cluster policies, and Single Node cluster mode:

  1. 创建Create a pool. 你可以将最大容量设置为 10,启用自动缩放本地存储功能,并选择实例类型和 Databricks Runtime 版本。You can set max capacity to 10, enable autoscaling local storage, and choose the instance types and Databricks Runtime version. 记录 URL 中的池 ID。Record the pool ID from the URL.

  2. 创建群集策略Create a cluster policy. 策略中针对实例池 ID 和节点类型 ID 的值应与池属性匹配。The value in the policy for instance pool ID and node type ID should match the pool properties. 你可以放宽约束以满足你的需求。You can relax the constraints to match your needs. 请参阅管理群集策略See Manage cluster policies.

  3. 向团队成员授予群集策略。Grant the cluster policy to the team members. 你可以使用管理用户和组来简化用户管理。You can use Manage users and groups to simplify user management.

    {
      "spark_conf.spark.databricks.cluster.profile": {
        "type": "fixed",
        "value": "singleNode",
        "hidden": true
      },
      "instance_pool_id": {
        "type": "fixed",
        "value": "singleNodePoolId1",
        "hidden": true
      },
      "spark_version": {
        "type": "fixed",
        "value": "7.3.x-cpu-ml-scala2.12",
        "hidden": true
      },
      "autotermination_minutes": {
        "type": "fixed",
        "value": 120,
        "hidden": true
      },
      "node_type_id": {
        "type": "fixed",
        "value": "Standard_DS14_v2",
        "hidden": true
      },
      "num_workers": {
        "type": "fixed",
        "value": 0,
        "hidden": true
      }
    }
    

单节点作业群集策略 Single Node job cluster policy

若要设置作业的群集策略,可以定义一个类似的群集策略。To set up a cluster policy for jobs, you can define a similar cluster policy. 记住将 cluster_type 的 “type” 设置为 “fixed”,将 “value” 设置为 “job” 并删除对 auto_termination_minutes 的任何引用。Remember to set the cluster_type “type” set to “fixed” and “value” set to “job” and remove any reference to auto_termination_minutes.

{
  "cluster_type": {
    "type": "fixed",
    "value": "job"
  },
  "spark_conf.spark.databricks.cluster.profile": {
    "type": "forbidden",
    "hidden": true
  },
  "spark_conf.spark.master": {
    "type": "fixed",
    "value": "local[*]"
  },
  "instance_pool_id": {
    "type": "fixed",
    "value": "singleNodePoolId1",
    "hidden": true
  },
  "num_workers": {
    "type": "fixed",
    "value": 0,
    "hidden": true
  },
  "spark_version": {
    "type": "fixed",
    "value": "7.3.x-cpu-ml-scala2.12",
    "hidden": true
  },
  "node_type_id": {
    "type": "fixed",
    "value": "Standard_DS14_v2",
    "hidden": true
  },
  "driver_node_type_id": {
    "type": "fixed",
    "value": "Standard_DS14_v2",
    "hidden": true
  }
}