Federated queries (Lakehouse Federation)
Applies to: Databricks SQL Databricks Runtime 13.3 LTS and above Unity Catalog only
Query federation allows Azure Databricks to execute queries against data served by other Azure Databricks metastores as well as many third-party database management systems (DBMS) such as PostgreSQL, mySQL, and Snowflake.
To query data from another system you must:
- Create a foreign connection. This registers the specific federated server with Unity Catalog and establishes means to communicate with it, such as the URL, port, and credentials used.
- Register foreign catalogs from the federated server with Unity Catalog
- Grant users access to the foreign catalogs. This can be done at the catalog, schema, or table level as you would do with regular securables.
You can now issue queries across the various local and foreign relations.
Foreign connection
A foreign connection is a Unity Catalog securable object that identifies a foreign server. As part of CREATE CONNECTION, you specify the URL where the server can be accessed.
You must also supply options such as the username and password or other accepted authentication, which Azure Databricks will use to communicate.
Foreign catalog
Given a foreign connection which supports three-level namespaces (catalog/database.schema.table
) you can register entire catalogs with Unity Catalog using the CREATE FOREIGN CATALOG command.
Azure Databricks keeps the definition of the catalog's schemas and their relations in sync with the foreign source.
Examples
-- Create a postgresql connection
> CREATE CONNECTION postgresql_connection
TYPE POSTGRESQL
OPTIONS (
host 'qf-postgresql-demo.xxxxxx.cn-north-2.rds.amazonaws.com',
port '5432',
user 'postgresql_user',
password 'password123');
-- Alternatively create a postgresql connection with secret scope
> CREATE CONNECTION postgresql_connection
TYPE POSTGRESQL
OPTIONS (
host 'qf-postgresql-demo.xxxxxx.cn-north-2.rds.amazonaws.com',
port '5432',
user secret('secrets.r.us', 'postgresUser'),
password secret('secrets.r.us', 'postgresPassword'));
-- Expose the "postgresdb" database with schemas and tables postgresql_user can access.
> CREATE FOREIGN CATALOG postgresql_catalog
USING CONNECTION postgresql_connection
OPTIONS (database 'postgresdb');
-- Execute a query across tables in the above catalog, schema, and table.
> SELECT * FROM postgresql_catalog.a_schema.table1
UNION ALL
SELECT * FROM default.postgresql_schema.table2
UNION ALL
SELECT * FROM default.postgresql.mytable
UNION ALL
SELECT local_table;
...