群集Clusters
Azure Databricks 群集是一组计算资源和配置,在其中可以运行数据工程、数据科学和数据分析工作负荷,例如生产 ETL 管道、流分析、即席分析和机器学习。An Azure Databricks cluster is a set of computation resources and configurations on which you run data engineering, data science, and data analytics workloads, such as production ETL pipelines, streaming analytics, ad-hoc analytics, and machine learning.
可将这些工作负荷作为笔记本中的一组命令运行,或者作为自动化作业运行。You run these workloads as a set of commands in a notebook or as an automated job. Azure Databricks 会区分通用群集和作业群集 。Azure Databricks makes a distinction between all-purpose clusters and job clusters. 借助通用群集,可通过交互式笔记本以协作的方式分析数据。You use all-purpose clusters to analyze data collaboratively using interactive notebooks. 借助作业群集,可运行快速可靠的自动化作业。You use job clusters to run fast and robust automated jobs.
- 可使用 UI、CLI 或 REST API 创建通用群集。You can create an all-purpose cluster using the UI, CLI, or REST API. 可手动终止和重启通用群集。You can manually terminate and restart an all-purpose cluster. 多个用户可以共享此类群集,以协作的方式执行交互式分析。Multiple users can share such clusters to do collaborative interactive analysis.
- 当你在新的作业群集上运行作业时,Azure Databricks 作业计划程序将创建一个作业群集,并在作业完成时终止该群集 。The Azure Databricks job scheduler creates a job cluster when you run a job on a new job cluster and terminates the cluster when the job is complete. 无法重启作业群集。You cannot restart a job cluster.
此部分介绍如何通过 UI 来使用群集。This section describes how to work with clusters using the UI. 有关其他方法,请参阅群集 CLI 和群集 API。For other methods, see Clusters CLI and Clusters API.
此外,本部分将重点放在通用群集而不是作业群集上,不过,所述的许多配置和管理工具对于这两种群集类型同样适用。This section also focuses more on all-purpose than job clusters, although many of the configurations and management tools described apply equally to both cluster types. 若要详细了解如何创建作业群集,请参阅作业。To learn more about creating job clusters, see Jobs.
重要
Azure Databricks 保留最近 30 天内终止的最多 70 个通用群集的群集配置信息,以及作业计划程序最近终止的最多 30 个作业群集的群集配置信息。Azure Databricks retains cluster configuration information for up to 70 all-purpose clusters terminated in the last 30 days and up to 30 job clusters recently terminated by the job scheduler. 若要在通用群集已终止超过 30 天后仍保留通用群集配置,管理员可将群集固定到群集列表。To keep an all-purpose cluster configuration even after it has been terminated for more than 30 days, an administrator can pin a cluster to the cluster list.
本部分内容:In this section:
- 创建群集Create a cluster
- 管理群集Manage clusters
- 显示分类Display clusters
- 固定群集Pin a cluster
- 以 JSON 文件的形式查看群集配置View a cluster configuration as a JSON file
- 编辑群集Edit a cluster
- 克隆群集Clone a cluster
- 控制对群集的访问Control access to clusters
- 启动群集Start a cluster
- 终止群集Terminate a cluster
- 删除群集Delete a cluster
- 在 Apache Spark UI 中查看群集信息View cluster information in the Apache Spark UI
- 查看群集日志View cluster logs
- 监视性能Monitor performance
- 配置群集Configure clusters
- 群集策略Cluster policy
- 群集模式Cluster mode
- 池Pool
- Databricks RuntimeDatabricks Runtime
- Python 版本Python version
- 群集节点类型Cluster node type
- 群集大小和自动缩放Cluster size and autoscaling
- 自动缩放本地存储Autoscaling local storage
- Spark 配置Spark configuration
- 启用本地磁盘加密Enable local disk encryption
- 环境变量Environment variables
- 群集标记Cluster tags
- 通过 SSH 访问群集SSH access to clusters
- 群集日志传送Cluster log delivery
- 初始化脚本Init scripts
- 任务抢占Task preemption
- 使用 Databricks 容器服务自定义容器Customize containers with Databricks Container Services
- 群集节点初始化脚本Cluster node initialization scripts
- 初始化脚本类型Init script types
- 初始化脚本执行顺序Init script execution order
- 环境变量Environment variables
- LoggingLogging
- 群集范围的初始化脚本Cluster-scoped init scripts
- 全局初始化脚本(新)Global init scripts (new)
- 旧版全局初始化脚本(已弃用)Legacy global init scripts (deprecated)
- 群集命名的初始化脚本(已弃用)Cluster-named init scripts (deprecated)
- 旧版全局和群集命名的初始化脚本日志(已弃用)Legacy global and cluster-named init script logs (deprecated)
- 支持 GPU 的群集GPU-enabled clusters
- 单节点群集Single Node clusters
- 池Pools
- Web 终端Web terminal