Federated queries (Lakehouse Federation)

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 13.3 LTS and above check marked yes Unity Catalog only

Query federation allows Azure Databricks to execute queries against data served by other Azure Databricks metastores as well as many third-party database management systems (DBMS) such as PostgreSQL, mySQL, and Snowflake.

To query data from another system you must:

  1. Create a foreign connection. This registers the specific federated server with Unity Catalog and establishes means to communicate with it, such as the URL, port, and credentials used.
  2. Register foreign catalogs from the federated server with Unity Catalog
  3. Grant users access to the foreign catalogs. This can be done at the catalog, schema, or table level as you would do with regular securables.

You can now issue queries across the various local and foreign relations.

Foreign connection

A foreign connection is a Unity Catalog securable object that identifies a foreign server. As part of CREATE CONNECTION, you specify the URL where the server can be accessed.

You must also supply options such as the username and password or other accepted authentication, which Azure Databricks will use to communicate.

Foreign catalog

Given a foreign connection which supports three-level namespaces (catalog/database.schema.table) you can register entire catalogs with Unity Catalog using the CREATE FOREIGN CATALOG command. Azure Databricks keeps the definition of the catalog's schemas and their relations in sync with the foreign source.

Examples

-- Create a postgresql connection
> CREATE CONNECTION postgresql_connection
    TYPE POSTGRESQL
    OPTIONS (
       host 'qf-postgresql-demo.xxxxxx.cn-north-2.rds.amazonaws.com',
       port '5432',
       user 'postgresql_user',
       password 'password123');

-- Alternatively create a postgresql connection with secret scope
> CREATE CONNECTION postgresql_connection
    TYPE POSTGRESQL
    OPTIONS (
       host 'qf-postgresql-demo.xxxxxx.cn-north-2.rds.amazonaws.com',
       port '5432',
       user secret('secrets.r.us', 'postgresUser'),
       password secret('secrets.r.us', 'postgresPassword'));

-- Expose the "postgresdb" database with schemas and tables postgresql_user can access.
> CREATE FOREIGN CATALOG postgresql_catalog
    USING CONNECTION postgresql_connection
    OPTIONS (database 'postgresdb');

-- Execute a query across tables in the above catalog, schema, and table.
> SELECT * FROM postgresql_catalog.a_schema.table1
  UNION ALL
  SELECT * FROM default.postgresql_schema.table2
  UNION ALL
  SELECT * FROM default.postgresql.mytable
  UNION ALL
  SELECT local_table;
  ...