Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Query external databases using the
Important
This feature is in Public Preview.
The remote_query table-valued function (TVF) enables you to run SQL queries directly against external databases and data warehouses from within Azure Databricks, using the native SQL syntax of the remote system. This function provides a flexible alternative to query federation, allowing you to run queries written in the remote database's dialect without needing to translate them to Databricks SQL.
remote_query compared to query federation
The following table summarizes the key differences between the remote_query function and query federation:
| Attribute | remote_query function |
Query federation |
|---|---|---|
| Query syntax | Write queries using the native SQL dialect of the remote database (for example, Oracle PL/SQL, BigQuery SQL). | Write queries using Databricks SQL syntax. Databricks translates and pushes down compatible operations to the remote database. |
| Use case |
|
|
| Access control | Users need USE CONNECTION privilege on the connection. Permission can be delegated through views. |
Users need table-level privileges on foreign catalog objects. Fine-grained control. |
Before you begin
Workspace requirements:
- Workspace enabled for Unity Catalog.
Compute requirements:
- Network connectivity from your Databricks Runtime cluster or SQL warehouse to the target database systems. See Networking recommendations for Lakehouse Federation.
- Azure Databricks clusters must use Databricks Runtime 17.3 or above.
- SQL warehouses must be Pro and use version 2025.35 or above.
Permissions required:
- To create a connection, you must be a metastore admin or a user with the
CREATE CONNECTIONprivilege on the Unity Catalog metastore. - To use the
remote_queryfunction, you must have theUSE CONNECTIONprivilege on the connection or theSELECTprivilege on a view that wraps the function. Single-user clusters also require theMANAGEpermission on the connection.
Create a connection
To use the remote_query function, you first need to create a Unity Catalog connection to your external database. If you already have a connection created for query federation, you can reuse it.
The remote_query function supports connections to the following connection types:
For information about managing existing connections, see Manage connections for Lakehouse Federation.
Grant connection access
To use the remote_query function, you must have the USE CONNECTION privilege on the connection (or the MANAGE privilege on single-user clusters).
GRANT USE CONNECTION ON CONNECTION <connection-name> TO <user-or-group>;
Use the remote_query function
The remote_query function runs a query on the remote database and returns the results as a table that you can use in Databricks SQL queries.
Syntax
SELECT * FROM remote_query(
'<connection-name>',
<option-key> => '<option-value>'
[, <option-key> => '<option-value>' ...]
)
Required parameters
connection-name: The name of the Unity Catalog connection to use.
All other required parameters vary by connection type. See Connector-specific options for details.
Connector-specific options
The available options vary by connection type. The following tables describe the options for each connector.
MySQL, PostgreSQL, SQL Server, Redshift, and Teradata
| Parameter | Required | Description |
|---|---|---|
database |
Yes | The database name on the remote system. |
query |
Yes (or dbtable) |
A SQL query string to run on the remote database. Cannot be used with dbtable. |
dbtable |
Yes (or query) |
The table name to query. Cannot be used with query. |
fetchsize |
No | The number of rows to fetch per round trip. Larger values can improve performance but use more memory. Default: 0 (use driver default). |
partitionColumn |
No | A column with uniformly distributed values to use for parallel data fetching. Must be used with lowerBound, upperBound, and numPartitions. Cannot be used with the query option. |
lowerBound |
No | The minimum value of the partition column. Must be used with partitionColumn, upperBound, and numPartitions. |
upperBound |
No | The maximum value of the partition column. Must be used with partitionColumn, lowerBound, and numPartitions. |
numPartitions |
No | The number of parallel connections to use for fetching data. Do not set this too high (hundreds). Must be used with partitionColumn, lowerBound, and upperBound. |
Note
When you use partition parameters, all four parameters (partitionColumn, lowerBound, upperBound, numPartitions) must be specified together, and you must use the dbtable option instead of query.
Oracle
| Parameter | Required | Description |
|---|---|---|
serviceName |
Yes | The Oracle service name (used instead of database). |
query |
Yes (or dbtable) |
A SQL query string to run on the remote database. Cannot be used with dbtable. |
dbtable |
Yes (or query) |
The table name to query. Cannot be used with query. |
fetchsize |
No | The number of rows to fetch per round trip. Larger values can improve performance but use more memory. Default: 0 (use driver default). |
partitionColumn |
No | A column with uniformly distributed values to use for parallel data fetching. Must be used with lowerBound, upperBound, and numPartitions. Cannot be used with the query option. |
lowerBound |
No | The minimum value of the partition column. Must be used with partitionColumn, upperBound, and numPartitions. |
upperBound |
No | The maximum value of the partition column. Must be used with partitionColumn, lowerBound, and numPartitions. |
numPartitions |
No | The number of parallel connections to use for fetching data. Do not set this too high (hundreds). Must be used with partitionColumn, lowerBound, and upperBound. |
Note
When you use partition parameters, all four parameters (partitionColumn, lowerBound, upperBound, numPartitions) must be specified together, and you must use the dbtable option instead of query.
Snowflake
| Parameter | Required | Description |
|---|---|---|
database |
Yes | The database name in Snowflake. |
query |
Yes (or dbtable) |
A SQL query string to run on the remote database. Cannot be used with dbtable. |
dbtable |
Yes (or query) |
The table name to query (single-part name or multi-part name). Cannot be used with query. |
schema |
No | The schema name in Snowflake. Default: public. |
query_timeout |
No | The query timeout in seconds. Default: 0 (no timeout). |
partition_size_in_mb |
No | The expected partition size in megabytes for parallel data fetching. Default: 100 MB. |
Additional pushdown control options
You can combine the remote_query function with Databricks SQL operations, and most of those operations can be pushed down as well. You can also control which Databricks SQL operations can be pushed down. These options apply to all connection types and are case-insensitive.
| Parameter | Default | Description |
|---|---|---|
pushdown.limit.enabled |
true |
Enable or disable pushing down LIMIT clauses to the remote database. |
pushdown.offset.enabled |
true |
Enable or disable pushing down OFFSET clauses to the remote database. |
pushdown.filters.enabled |
true |
Enable or disable pushing down WHERE filters to the remote database. |
pushdown.aggregates.enabled |
true |
Enable or disable pushing down aggregate functions (COUNT, SUM, AVG, MAX, MIN) to the remote database. |
pushdown.sortLimit.enabled |
true |
Enable or disable pushing down top-N queries (combination of ORDER BY and LIMIT) to the remote database. |
By default, all pushdowns are enabled. You can disable specific pushdowns if needed for troubleshooting or to work around compatibility issues with specific remote databases.
Delegate access through views
You can delegate access to remote data without granting users direct USE CONNECTION privileges by wrapping the remote_query function in a view. This approach has the following benefits:
- Simplified access control: Grant
SELECTprivilege on the view instead of managingUSE CONNECTIONprivileges. - Data security: Control which columns and rows users can access by defining the view query.
- Track lineage: Track data access through view lineage rather than direct connection usage.
To delegate access through a view:
Create a view that calls the
remote_queryfunction:CREATE VIEW sales_data_view AS SELECT * FROM remote_query( 'my_connection', database => 'sales_db', query => 'SELECT region, product, revenue FROM sales' );Grant
SELECTprivilege on the view to users or groups:GRANT SELECT ON VIEW sales_data_view TO <user-or-group>;Users can now query the view without needing the
USE CONNECTIONprivilege:SELECT * FROM sales_data_view WHERE region = 'US';
Important
The view owner must have USE CONNECTION privilege on the connection. When users query the view, the connection access check is performed using the view owner's privileges, not the querying user's privileges.
Examples
Basic query execution
Execute a query on a PostgreSQL database:
SELECT * FROM remote_query(
'my_postgres_connection',
database => 'sales_db',
query => 'SELECT * FROM orders WHERE order_date > CURRENT_DATE - INTERVAL \'30 days\''
);
Query a specific table
Query a MySQL table directly:
SELECT * FROM remote_query(
'my_mysql_connection',
database => 'inventory',
dbtable => 'my_schema.products'
);
Oracle with service name
Query an Oracle database:
SELECT * FROM remote_query(
'my_oracle_connection',
serviceName => 'ORCL',
query => 'SELECT * FROM customers WHERE ROWNUM <= 1000'
);
Snowflake query
Query Snowflake:
SELECT * FROM remote_query(
'my_snowflake_connection',
database => 'ANALYTICS_DB',
query => 'SELECT * FROM SALES WHERE SALE_DATE >= DATEADD(day, -30, CURRENT_DATE())'
);
Performance tuning with partitioning
Fetch data in parallel from a SQL Server table:
SELECT * FROM remote_query(
'my_sqlserver_connection',
database => 'sales',
dbtable => 'transactions',
partitionColumn => 'transaction_id',
lowerBound => '0',
upperBound => '1000000',
numPartitions => '10',
fetchsize => '1000'
);
Combine with Databricks SQL operations
Apply additional filters and transformations:
SELECT customer_id, SUM(amount) as total_amount
FROM remote_query(
'my_postgres_connection',
database => 'orders_db',
query => 'SELECT customer_id, amount, order_date FROM orders'
)
WHERE order_date >= '2025-01-01'
GROUP BY customer_id
HAVING total_amount > 1000
ORDER BY total_amount DESC
LIMIT 100;
Create a view for delegated access
Create a view that wraps the remote_query function. Users with SELECT privilege on the view can query the data without needing USE CONNECTION privilege on the underlying connection:
CREATE VIEW sales_summary AS
SELECT * FROM remote_query(
'my_mysql_connection',
database => 'sales',
query => 'SELECT region, product, SUM(revenue) as total_revenue FROM sales_data GROUP BY region, product'
);
GRANT SELECT ON VIEW sales_summary TO <user-or-group>;
Control pushdown behavior
When you use the remote_query function, Databricks can push down additional operations to the remote database beyond the query you specify. This feature is useful when you query a view that uses the remote_query function.
The following operations can be pushed down:
- Filters:
WHEREclauses applied to the result of the remote query - Projections: Column selection (
SELECTspecific columns) - Limit:
LIMITclauses to restrict the number of rows returned - Offset:
OFFSETclauses to skip rows - Aggregates: Aggregation functions like
COUNT,SUM,AVG,MAX,MIN - Top-N: Combination of
ORDER BYandLIMITfor top/bottom N queries
Pushdown support varies by data source. See the documentation for your specific connection type for details.
Disable specific pushdowns for troubleshooting or compatibility:
SELECT * FROM remote_query(
'my_postgres_connection',
database => 'analytics',
query => 'SELECT * FROM complex_view',
`pushdown.aggregates.enabled` => 'false',
`pushdown.filters.enabled` => 'false'
);
Limitations
Read-only operations: The
remote_queryfunction only supportsSELECTqueries. Data modification operations (INSERT, UPDATE, DELETE, MERGE), DDL operations (CREATE, DROP, ALTER) and stored procedures are not supported.Query validation: The query you provide is executed directly on the remote database. Databricks validates that the query is read-only by performing schema inspection, but syntax and semantic validation are performed by the remote database.
Troubleshooting
Permission errors
If you receive a permission error, verify that:
- You have the
USE CONNECTIONprivilege on the connection or theSELECTprivilege on a view that wraps the function. - The credentials in the connection have appropriate permissions on the remote database.
Example error:
PERMISSION_DENIED: User does not have USE CONNECTION on Connection 'my_connection'
Resolution:
GRANT USE CONNECTION ON CONNECTION my_connection TO <user-or-group>;
Unsupported parameters
If you receive an error about unsupported parameters, check that you are using the correct parameters for your connection type. The error message lists the allowed parameters.
Example error:
REMOTE_QUERY_FUNCTION_UNSUPPORTED_CONNECTOR_PARAMETERS: The following parameters are not supported for connection type 'postgresql': 'materializationDataset'. Allowed parameters for this connection type are: 'database', 'query', 'dbtable', 'fetchsize', 'partitionColumn', 'lowerBound', 'upperBound', 'numPartitions'.
Resolution: Remove the unsupported parameter and use the correct parameters for your connection type.
DML operations not supported
The remote_query function only supports read-only SELECT queries.
Example error:
DML_OPERATIONS_NOT_SUPPORTED_FOR_REMOTE_QUERY_FUNCTION: Data modification operations are not supported in remote_query function.
Resolution: Remove any INSERT, UPDATE, DELETE, or DDL statements from your query. Only use SELECT statements.