群集Clusters

Azure Databricks 群集是一组计算资源和配置,在其中可以运行数据工程、数据科学和数据分析工作负荷,例如生产 ETL 管道、流分析、即席分析和机器学习。An Azure Databricks cluster is a set of computation resources and configurations on which you run data engineering, data science, and data analytics workloads, such as production ETL pipelines, streaming analytics, ad-hoc analytics, and machine learning.

可将这些工作负荷作为笔记本中的一组命令运行,或者作为自动化作业运行。You run these workloads as a set of commands in a notebook or as an automated job. Azure Databricks 会区分通用群集和作业群集 。Azure Databricks makes a distinction between all-purpose clusters and job clusters. 借助通用群集,可通过交互式笔记本以协作的方式分析数据。You use all-purpose clusters to analyze data collaboratively using interactive notebooks. 借助作业群集,可运行快速可靠的自动化作业。You use job clusters to run fast and robust automated jobs.

  • 可使用 UI、CLI 或 REST API 创建通用群集。You can create an all-purpose cluster using the UI, CLI, or REST API. 可手动终止和重启通用群集。You can manually terminate and restart an all-purpose cluster. 多个用户可以共享此类群集,以协作的方式执行交互式分析。Multiple users can share such clusters to do collaborative interactive analysis.
  • 当你在新的作业群集上运行作业时,Azure Databricks 作业计划程序将创建一个作业群集,并在作业完成时终止该群集 。The Azure Databricks job scheduler creates a job cluster when you run a job on a new job cluster and terminates the cluster when the job is complete. 无法重启作业群集。You cannot restart a job cluster.

此部分介绍如何通过 UI 来使用群集。This section describes how to work with clusters using the UI. 有关其他方法,请参阅群集 CLI群集 APIFor other methods, see Clusters CLI and Clusters API.

此外,本部分将重点放在通用群集而不是作业群集上,不过,所述的许多配置和管理工具对于这两种群集类型同样适用。This section also focuses more on all-purpose than job clusters, although many of the configurations and management tools described apply equally to both cluster types. 若要详细了解如何创建作业群集,请参阅作业To learn more about creating job clusters, see Jobs.

重要

Azure Databricks 保留最近 30 天内终止的最多 70 个通用群集的群集配置信息,以及作业计划程序最近终止的最多 30 个作业群集的群集配置信息。Azure Databricks retains cluster configuration information for up to 70 all-purpose clusters terminated in the last 30 days and up to 30 job clusters recently terminated by the job scheduler. 若要在通用群集已终止超过 30 天后仍保留通用群集配置,管理员可将群集固定到群集列表。To keep an all-purpose cluster configuration even after it has been terminated for more than 30 days, an administrator can pin a cluster to the cluster list.

本部分内容:In this section: