Networking recommendations for Lakehouse Federation

This article provides guidance for setting up a viable network path between your Azure Databricks clusters or SQL warehouses and the external database system that you are connecting to using Lakehouse Federation.

Bear the following important information in mind:

  • All network traffic is directly between Azure Databricks clusters (or SQL warehouses) and the external database system. Neither Unity Catalog or the Azure Databricks control plane are on the network path.
  • Azure Databricks compute (that is, clusters and SQL warehouses) always deploys in the cloud, but the external database system can be on-premises or hosted on any cloud provider, as long as there's a viable network path between your Azure Databricks compute and the external database.
  • If you have inbound or outbound network restrictions on either Azure Databricks compute or the external database system, refer to the following sections for general guidance to help you create a viable network path.

For more information on networking in Azure Databricks workspaces, see Networking.

Database system and Azure Databricks compute both accessible from internet

The connection should work without any configuration.

Database system has network access restrictions

If the external database system has inbound or outbound network access restrictions and the Azure Databricks cluster or SQL warehouse is accessible from the internet, then configure one of the following network solutions to connect from classic compute resources:

  • Stable egress IP on Azure Databricks compute.

    From the classic compute plane, set up a stable IP address with a load balancer, NAT gateway, internet gateway, or equivalent, and connect it to the subnet where Azure Databricks compute is deployed. This allows the compute resource to share a stable public IP address that can be allowlisted on the external database side.

  • Private Link (only when the external database is on the same cloud as Azure Databricks compute)

    From the classic compute plane, configure a Private Link connection between the network where the database is deployed and the network where Azure Databricks compute is deployed.

Azure Databricks compute has network access restrictions

If the external database system is accessible from the Internet and the Azure Databricks compute has inbound or outbound network access restrictions (which is only possible if you are on a customer-managed network), then perform one of the following configurations:

  • Allowlist the hostname of the external database in the firewall rules of the subnet where Azure Databricks compute is deployed.

    If you choose to allowlist the external database IP address rather than hostname, make sure that the external database has a stable IP address.

  • Private Link (only when the external database is on same cloud as Azure Databricks compute)

    Configure a Private Link connection between the network where the database is deployed and the network where Azure Databricks compute is deployed.

Azure Databricks compute has a custom DNS server

If the external database system is accessible from the Internet and the Azure Databricks compute has a custom DNS server (which is only possible if you are on a customer-managed network), add the database system's hostname to your custom DNS server so that it can be resolved.