池配置 Pool configurations

本文介绍了创建和编辑池时可用的配置选项。This article explains the configuration options available when you create and edit a pool.

配置池Configure pool

池大小和自动终止 Pool size and auto termination

创建池时,若要控制其大小,你可以设置三个参数:最小空闲实例数、最大容量和空闲实例自动终止。When you create a pool, in order to control its size, you can set three parameters: minimum idle instances, maxium capacity, and idle instance auto termination.

最小空闲实例数 Minimum Idle Instances

池保持空闲状态的最小实例数。The minimum number of instances the pool keeps idle. 无论“空闲实例自动终止”中指定的设置如何,这些实例都不会终止。These instances do not terminate, regardless of the setting specified in Idle Instance Auto Termination. 如果群集使用池中的空闲实例,则 Azure Databricks 会预配更多的实例,以维持此最小值。If a cluster consumes idle instances from the pool, Azure Databricks provisions additional instances to maintain the minimum.

最小空闲实例数配置Minimum Idle Instances configuration

最大容量 Maximum Capacity

池将预配的最大实例数。The maximum number of instances that the pool will provision. 如果设置了此项,则此值约束所有实例(空闲 + 已使用)。If set, this value constrains all instances (idle + used). 如果使用池的群集在自动缩放期间请求比此数目更多的实例,则请求会失败并出现 INSTANCE_POOL_MAX_CAPACITY_FAILURE 错误。If a cluster using the pool requests more instances than this number during autoscaling, the request will fail with an INSTANCE_POOL_MAX_CAPACITY_FAILURE error.

最大容量配置Maximum Capacity configuration

此配置是可选的。This configuration is optional . Azure Databricks 建议仅在以下情况下设置值:Azure Databricks recommend setting a value only in the following circumstances:

  • 你有一个不能超过的实例配额。You have an instance quota you must stay under.
  • 你想要防止一组工作影响另一组工作。You want to protect one set of work from impacting another set of work. 例如,假设你的实例配额为 100,你的团队 A 和 B 需要运行作业。For example, suppose your instance quota is 100 and you have teams A and B that need to run jobs. 你可以创建最大配额为 50 的池 A 和最大配额为 50 的池 B,以便两个团队公平地共享配额 100。You can create pool A with a max 50 and pool B with max 50 so that the two teams share the 100 quota fairly.
  • 你需要控制成本。You need to cap cost.

空闲实例自动终止Idle Instance Auto Termination

超出最小空闲实例数中设置的值的实例在被池终止之前可以空闲的时间(以分钟为单位)。The time in minutes that instances above the value set in Minimum Idle Instances can be idle before being terminated by the pool.

空闲实例自动终止配置Idle Instance Auto Termination configuration

实例类型 Instance types

池由为新群集准备好的空闲实例和正在运行的群集使用的实例组成。A pool consists of both idle instances kept ready for new clusters and instances in use by running clusters. 所有这些实例都属于相同的实例提供程序类型,该类型是在创建池时选择的。All of these instances are of the same instance provider type, selected when creating a pool.

无法编辑池的实例类型。A pool’s instance type cannot be edited. 附加到池的群集为驱动程序和工作器节点使用相同的实例类型。Clusters attached to a pool use the same instance type for the driver and worker nodes. 不同的实例类型系列适用于不同的用例,例如内存密集型工作负荷或计算密集型工作负荷。Different families of instance types fit different use cases, such as memory-intensive or compute-intensive workloads.

实例类型Instance types

Azure Databricks 在停止支持实例类型之前,始终会提供为期一年的弃用通知。Azure Databricks always provides one year’s deprecation notice before ceasing support for an instance type.

备注

如果安全要求包括计算隔离,请选择一个 Standard_F72s_V2 实例作为工作器类型。If your security requirements include compute isolation, select a Standard_F72s_V2 instance as your worker type. 这些实例类型表示使用整个物理主机的隔离虚拟机,并提供为特定工作负荷(例如美国国防部影响级别 5 (IL5) 工作负荷)提供支持所需的隔离级别。These instance types represent isolated virtual machines that consume the entire physical host and provide the necessary level of isolation required to support, for example, US Department of Defense Impact Level 5 (IL5) workloads.

预加载的 Databricks Runtime 版本 Preload Databricks Runtime version

可以通过选择要在池中空闲实例上加载的 Databricks Runtime 版本来加快群集启动。You can speed up cluster launches by selecting a Databricks Runtime version to be loaded on idle instances in the pool. 如果用户在创建由池支持的群集时选择了该运行时,则该群集甚至会比未使用预加载 Databricks Runtime 版本的池支持的群集更快地启动。If a user selects that runtime when they create a cluster backed by the pool, that cluster will launch even more quickly than a pool-backed cluster that doesn’t use a preloaded Databricks Runtime version.

预加载的的运行时版本Preloaded runtime version

池标记 Pool tags

可以使用池标记轻松地监视组织中各种组所使用的云资源的成本。Pool tags allow you to easily monitor the cost of cloud resources used by various groups in your organization. 你可以在创建池时将标记指定为键值对,Azure Databricks 会将这些标记应用于 VM 和磁盘卷等云资源。You can specify tags as key-value pairs when you create a pool, and Azure Databricks applies these tags to cloud resources like VMs and disk volumes.

为了方便起见,Azure Databricks 对每个池应用三个默认标记:VendorDatabricksInstancePoolIdDatabricksInstancePoolCreatorIdFor convenience, Azure Databricks applies three default tags to each pool: Vendor, DatabricksInstancePoolId, and DatabricksInstancePoolCreatorId. 你还可以在创建池时添加自定义标记。You can also add custom tags when you create a pool. 最多可以添加 41 个自定义标记。You can add up to 41 custom tags.

自定义标记继承Custom tag inheritance

池支持的群集从池配置继承默认的和自定义的标记。Pool-backed clusters inherit default and custom tags from the pool configuration. 若要详细了解池标记和群集标记如何协同工作,请参阅使用群集、池和工作区标记监视使用情况For detailed information about how pool tags and cluster tags work together, see Monitor usage using cluster, pool, and workspace tags.

配置自定义池标记Configure custom pool tags

  1. 在池配置页面的底部,选择“标记”选项卡。At the bottom of the pool configuration page, select the Tags tab.

  2. 为自定义标记指定一个键值对。Specify a key-value pair for the custom tag.

    标记键值对Tag key-value pair

  3. 单击“添加” 。Click Add .

自动缩放本地存储 Autoscaling local storage

通常,估算特定作业会占用的磁盘空间量十分困难。It can often be difficult to estimate how much disk space a particular job will take. 为了让你不必估算在创建时要附加到池的托管磁盘的 GB 数,Azure Databricks 会自动在所有 Azure Databricks 池上启用自动缩放本地存储。To save you from having to estimate how many gigabytes of managed disk to attach to your pool at creation time, Azure Databricks automatically enables autoscaling local storage on all Azure Databricks pools.

自动缩放本地存储时,Azure Databricks 会监视池的实例上提供的可用磁盘空间量。With autoscaling local storage, Azure Databricks monitors the amount of free disk space available on your pool’s instances. 如果某个实例的磁盘空间太少,系统会在该实例的磁盘空间不足之前自动附加新的托管磁盘。If an instance runs too low on disk, a new managed disk is attached automatically before it runs out of disk space. 附加磁盘时,每个虚拟机的总磁盘空间(包括虚拟机的初始本地存储)存在 5 TB 的限制。Disks are attached up to a limit of 5 TB of total disk space per virtual machine (including the virtual machine’s initial local storage).

仅当虚拟机返回到 Azure 时,才会拆离附加到虚拟机的托管磁盘。The managed disks attached to a virtual machine are detached only when the virtual machine is returned to Azure. 也就是说,只要虚拟机属于某个池,就永远不会将托管磁盘从该虚拟机中拆离。That is, managed disks are never detached from a virtual machine as long as it is part of a pool.