Azure Databricks 体系结构概述Azure Databricks architecture overview

Databricks 统一数据分析平台由 Apache Spark 的原创人员倾力打造。数据团队可以使用该平台进行协作,以解决世界上最棘手的问题。The Databricks Unified Data Analytics Platform, from the original creators of Apache Spark, enables data teams to collaborate in order to solve some of the world’s toughest problems.

概要体系结构 High-level architecture

Azure Databricks 设计用来实现安全的跨职能团队协作,同时将大量的后端服务留给 Azure Databricks 进行管理,因此你可以专注于数据科学、数据分析和数据工程任务。Azure Databricks is structured to enable secure cross-functional team collaboration while keeping a significant amount of backend services managed by Azure Databricks so you can stay focused on your data science, data analytics, and data engineering tasks.

尽管体系结构可能因自定义配置而异(例如,当你将 Azure Databricks 工作区部署到自己的虚拟网络(也称为 VNet 注入)时就是如此),但下面的体系结构图示代表了 Azure Databricks 的最常见结构和数据流。Although architectures can vary depending on custom configurations (such as when you’ve deployed a Azure Databricks workspace to your own virtual network, also known as VNet injection), the following architecture diagram represents the most common structure and flow of data for Azure Databricks.

Databricks 体系结构Databricks architecture

Azure Databricks 在控制平面和数据平面上运行。Azure Databricks operates out of a control plane and a data plane.

控制平面包括 Azure Databricks 在其自己的 Azure 帐户中管理的后端服务。The control plane includes the backend services that Azure Databricks manages in its own Azure account. 你运行的任何命令都将存在于控制平面中,并且你的代码会完全加密。Any commands that you run will exist in the control plane with your code fully encrypted. 保存的命令驻留在数据平面中。Saved commands reside in the data plane.

数据平面由你的 Azure 帐户管理,是你的数据所在的位置。The data plane is managed by your Azure account and is where your data resides. 这也是对数据进行处理的位置。This is also where data is processed. 此图示假设数据已引入到 Azure Databricks 中,但你可以从外部数据源引入数据,例如事件数据、流式处理数据、IoT 数据,等等。This diagram assumes that data has already been ingested into Azure Databricks, but you can ingest data from external data sources, such as events data, streaming data, IoT data, and more. 你还可以使用 Azure Databricks 连接器连接到用于存储的 Azure 帐户外部的外部数据源。You can connect to external data sources outside of your Azure account for storage as well, using Azure Databricks connectors.

你的数据始终驻留在 Azure 帐户的数据平面中,而不是在控制平面中,因此,你始终可以在不锁定的情况下保持对数据的完全控制和所有权。Your data always resides in your Azure account in the data plane, not the control plane, so you always maintain full control and ownership of your data without lock-in.

有关体系结构的详细信息,请参阅管理虚拟网络For more architecture information, see Manage virtual networks.