安全群集连接(无公共 IP/NPIP)Secure cluster connectivity (No Public IP / NPIP)

启用安全群集连接后,客户虚拟网络没有开放的端口,并且 Databricks Runtime 群集节点没有公共 IP 地址。With secure cluster connectivity enabled, customer virtual networks have no open ports and Databricks Runtime cluster nodes have no public IP addresses. 安全群集连接也称为无公共 IP (NPIP)。Secure cluster connectivity is also known as No Public IP (NPIP).

  • 在网络级别,每个群集在群集创建过程中会启动到控制平面安全群集连接中继(代理)的连接。At a network level, each cluster initiates a connection to the control plane secure cluster connectivity relay (proxy) during cluster creation. 群集使用端口 443 (HTTPS) 和一个与 Web 应用程序和 REST API 所用 IP 地址不同的 IP 地址来建立此连接。The cluster establishes this connection using port 443 (HTTPS) and a different IP address than is used for the Web application and REST API.
  • 控制平面按逻辑启动的操作(例如启动新的 Databricks Runtime 作业或执行群集管理)将作为请求通过此反向隧道发送到群集。Actions that the control plane logically initiates, such as starting new Databricks Runtime jobs or performing cluster administration, are sent as requests to the cluster through this reverse tunnel.
  • 数据平面 (VNet) 没有开放的端口,Databricks Runtime 群集节点没有公共 IP 地址。The data plane (the VNet) does not have open ports and Databricks Runtime cluster nodes do not have public IP addresses.

备注

无论是否启用了安全群集连接,数据平面 VNet 与 Azure Databricks 控制平面之间的所有 Azure Databricks 网络流量均跨 Microsoft 网络主干而不是公共 Internet。Independent of whether secure cluster connectivity is enabled, all Azure Databricks network traffic between the data plane VNet and the Azure Databricks control plane goes across the Microsoft network backbone not the public Internet.

优点:Benefits:

  • 简单的网络管理 - 降低了复杂性,因为无需在安全组上进行端口配置,也无需配置网络对等互连。Easy network administration — Less complexity because there is no need for port configuration on security groups or configuring network peering.
  • 更轻松的审批 - 归功于更好的安全性和更简单的网络管理,信息安全团队可以更轻松地将 Databricks 批准为 PaaS 提供程序。Easier approval — Because of better security and simpler network administration, it is easier for information security teams to approve Databricks as a PaaS provider.

安全群集连接Secure cluster connectivity

使用安全群集连接Using secure cluster connectivity

若要将安全群集连接与 Azure Databricks 工作区结合使用,请使用以下任一选项:To use secure cluster connectivity with an Azure Databricks workspace, use any of the following options:

  • Azure 门户:预配工作区时,转到“网络”选项卡,将“使用安全群集连接(无公共 IP)部署 Azure Databricks 工作区”选项设置为“是” 。Azure Portal: When you provision the workspace, go the Networking tab and set the option Deploy Azure Databricks workspace with Secure Cluster Connectivity (No Public IP) to Yes.
  • ARM 模板:对于创建新工作区的 Microsoft.Databricks/workspaces 资源,将 enableNoPublicIp 布尔参数设置为 trueARM Templates: For the Microsoft.Databricks/workspaces resource that creates your new workspace, set the enableNoPublicIp Boolean parameter to true.

重要

在这两种情况下,你都必须在 Azure 订阅中注册 Azure 资源提供程序 Microsoft.ManagedIdentity,该提供程序将用于启动具有安全群集连接的工作区。In both cases, you must register the Azure Resource Provider Microsoft.ManagedIdentity in the Azure subscription that you are going to use to launch a workspace with secure cluster connectivity. 每个订阅只需进行一次注册操作。This is a one-time operation per subscription. 有关说明,请参阅 Azure 资源提供程序和类型For instructions, see Azure resource providers and types.

备注

安全群集连接仅适用于新工作区。Secure cluster connectivity is available only for new workspaces. 如果你要迁移具有公共 IP 的工作区,则应创建允许进行安全群集连接的新工作区,并将资源迁移到新工作区。If you have workspaces with public IPs that you would like to migrate, you should create new workspaces enabled for secure cluster connectivity and migrate your resources to the new workspaces. 有关详细信息,请联系你的 Microsoft 或 Databricks 帐户团队。Contact your Microsoft or Databricks account team for details.

如果使用的是 ARM 模板,则根据你是希望让 Azure Databricks 为工作区创建默认(托管)虚拟网络还是希望使用你自己的虚拟网络(也称为 VNet 注入),将参数添加到下列其中一个模板。If you’re using ARM templates, add the parameter to one of the following templates, based on whether you want Azure Databricks to create a default (managed) virtual network for the workspace, or if you want to use your own virtual network, also known as VNet injection. VNet 注入是一项可选功能,允许你提供自己的 VNet 来承载新的 Azure Databricks 群集。VNet injection is an optional feature that allows you to provide your own VNet to host new Azure Databricks clusters.

工作区子网的流出量Egress from workspace subnets

启用安全群集连接时,两个工作区子网都是专用子网,因为群集节点没有公共 IP 地址。When you enable secure cluster connectivity, both of your workspace subnets are private subnets, since cluster nodes do not have public IP addresses.

根据你是使用默认(托管)VNet 还是使用可选的 VNet 注入功能提供自己的用于部署工作区的 VNet,网络流出量的实现详细信息有所不同。The implementation details of network egress vary based on whether you use the default (managed) VNet or whether you use the optional VNet injection feature to provide your own VNet in which to deploy your workspace. 有关详细信息,请参阅以下部分。See the following sections for details.

重要

对出口流量进行管理以支持安全群集连接时,可能会产生额外的费用。There could be additional costs associated with managing egress traffic to support secure cluster connectivity. 对于需要成本优化型解决方案的更小型的组织,可以选择在禁用安全群集连接的情况下部署工作区。For a smaller organization that needs a cost-optimized solution, you can choose to deploy your workspace with secure cluster connectivity disabled. 但是,为了进行最安全的部署,Microsoft 和 Databricks 强烈建议为新工作区启用安全群集连接。However, for the most secure deployment, Microsoft and Databricks strongly recommend enabling secure cluster connectivity for new workspaces.

默认(托管)VNet 的流出量Egress with default (managed) VNet

如果将安全群集连接(无公共 IP/NPIP)功能与 Azure Databricks 创建的默认 VNet 结合使用,则 Azure Databricks 会自动为从工作区子网到 Azure 主干网和公共网络的出站流量创建一个 NAT 网关。If you use the secure cluster connectivity (No Public IP / NPIP) feature with the default VNet that Azure Databricks creates, Azure Databricks automatically creates a NAT gateway for outbound traffic from your workspace’s subnets to the Azure backbone and public network. NAT 网关是在 Azure Databricks 创建和管理的托管资源组中创建的。The NAT gateway is created within the managed resource group that Azure Databricks creates and manages. 不能修改此资源组,也不能修改其中预配的任何资源。You cannot modify this resource group nor any resources provisioned in it.

自动创建的 NAT 网关会产生额外的费用。There is an additional cost associated with the automatically-created NAT gateway.

VNet 注入的流出量Egress with VNet injection

如果将安全群集连接(无公共 IP/NPIP)功能与可选的 VNet 注入结合用于提供你自己的 VNet,则可通过多种选项来控制流出量。If you use the secure cluster connectivity (No Public IP / NPIP) feature with optional VNet injection to provide your own VNet, you have several options for controlling egress.

选择以下解决方案之一来控制出口流量,并确保工作区具有稳定的流出量公共 IP:Choose one of the following solutions to control egress traffic and ensure that your workspace has a stable egress public IP:

  • 流出量负载均衡器:对于更为简单的部署,建议的解决方案是流出量负载均衡器(也称为出站负载均衡器),其配置由 Azure Databricks 管理。Egress load balancer: The recommended solution for simpler deployments is an egress load balancer, also called an outbound load balancer, whose configuration is managed by Azure Databricks. 它为工作区群集提供稳定的流出量公共 IP,但你无法根据自定义流出量需求修改配置。It provides a stable egress public IP for your workspace clusters, but you cannot modify the configuration for custom egress needs. 这是一种仅限 Azure 模板的解决方案,具有以下要求:This is an Azure template-only solution with the following requirements:
    • Azure Databricks 需要向创建工作区的 ARM 模板添加其他字段:loadBalancerName(负载均衡器名称)、loadBalancerBackendPoolName(负载均衡器后端池名称)、loadBalancerFrontendConfigName(负载均衡器前端配置名称)和 loadBalancerPublicIpName(负载均衡器公共 IP 名称)。Azure Databricks expects additional fields to the ARM template that creates the workspace: loadBalancerName (load balancer name), loadBalancerBackendPoolName (load balancer backend pool name), loadBalancerFrontendConfigName (load balancer frontend configuration name) and loadBalancerPublicIpName (load balancer public IP name).
    • Azure Databricks 要求 Microsoft.Databricks/workspaces 资源具有 loadBalancerId(负载均衡器 ID)和 loadBalancerBackendPoolName(负载均衡器后端池名称)参数。Azure Databricks expects the Microsoft.Databricks/workspaces resource to have parameters loadBalancerId (load balancer ID) and loadBalancerBackendPoolName (load balancer backend pool name).
    • Azure Databricks 不支持更改负载均衡器的配置。Azure Databricks does not support changing the configuration of the load balancer.
  • NAT 网关:配置 Azure NAT 网关NAT gateway: Configure an Azure NAT gateway. 在工作区的两个子网上配置网关,以确保到 Azure 主干网和公共网络的所有出站流量都通过该网关传输。Configure the gateway on both of the workspace’s subnets to ensure that all outbound traffic to the Azure backbone and public network transits through it. 这也会为工作区群集提供稳定的流出量公共 IP,但从 Azure 网络角度而言,你可以根据自定义流出量需求修改配置。This also provides a stable egress public IP for your workspace’s clusters, but you can modify the configuration for custom egress needs as supported from an Azure Networking perspective. 可以使用 Azure 模板或通过 Azure 门户来实现此解决方案。You can implement this solution using either an Azure template or from the Azure portal.
  • 流出量防火墙或自定义设备:如果将 VNet 注入与流出量防火墙(如 Azure 防火墙)或其他自定义网络体系结构结合使用,则可以使用自定义路由,也称为用户定义的路由 (UDR)。Egress firewall or custom appliance: If you use VNet injection with an egress firewall like Azure Firewall or other custom networking architectures, you can use custom routes, which are also known as user-defined routes (UDRs). UDR 确保为工作区正确路由网络流量,无论是直接路由到所需终结点还是通过流出量防火墙路由。UDRs ensure that network traffic is routed correctly for your workspace, either directly to the required endpoints or through an egress firewall. 如果使用此类解决方案,则必须为 Azure Databricks 安全群集连接中继和用户定义的 Azure Databricks 路由设置中列出的其他所需终结点添加直接路由或允许的防火墙规则。If you use such a solution, you must add direct routes or allowed firewall rules for the Azure Databricks secure cluster connectivity relay and other required endpoints listed at User-defined route settings for Azure Databricks.