Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Important
Registering Python UDTFs in Unity Catalog is in Public Preview.
A Unity Catalog user-defined table function (UDTF) allows you to register functions that return complete tables instead of scalar values. Unlike scalar functions that return a single result value from each call, UDTFs are invoked in a SQL statement's FROM
clause and can return multiple rows and columns.
UDTFs are particularly useful for:
- Transforming arrays or complex data structures into multiple rows
- Integrating external APIs or services into SQL workflows
- Implementing custom data generation or enrichment logic
- Processing data that requires stateful operations across rows
Each UDTF call can accept zero or more arguments. These arguments can be scalar expressions or table arguments representing entire input tables.
UDTFs can be registered in two ways:
- Unity Catalog: Register the UDTF as a governed object in Unity Catalog.
- Session-scoped: Register to the local
SparkSession
, isolated to the current notebook or job. See Python user-defined table functions (UDTFs).
Requirements
Unity Catalog Python UDTFs are supported on the following compute types:
- Classic compute with standard access mode (Databricks Runtime 17.1 and above)
- SQL warehouse (pro, and classic)
Create a UDTF in Unity Catalog
Use SQL DDL to create a governed UDTF in Unity Catalog. UDTFs are invoked using a SQL statement's FROM
clause.
CREATE OR REPLACE FUNCTION square_numbers(start INT, end INT)
RETURNS TABLE (num INT, squared INT)
LANGUAGE PYTHON
HANDLER 'SquareNumbers'
DETERMINISTIC
AS $$
class SquareNumbers:
"""
Basic UDTF that computes a sequence of integers
and includes the square of each number in the range.
"""
def eval(self, start: int, end: int):
for num in range(start, end + 1):
yield (num, num * num)
$$;
SELECT * FROM square_numbers(1, 5);
+-----+---------+
| num | squared |
+-----+---------+
| 1 | 1 |
| 2 | 4 |
| 3 | 9 |
| 4 | 16 |
| 5 | 25 |
+-----+---------+
Databricks implements Python UDTFs as Python classes with a mandatory eval
method that yields output rows.
Practical examples
The following examples demonstrate real-world use cases for Unity Catalog Python UDTFs, progressing from simple data transformations to complex external integrations.
Example: Re-implementing explode
While Spark provides a built-in explode
function, creating your own version demonstrates the fundamental UDTF pattern of taking a single input and producing multiple output rows.
CREATE OR REPLACE FUNCTION my_explode(arr ARRAY<STRING>)
RETURNS TABLE (element STRING)
LANGUAGE PYTHON
HANDLER 'MyExplode'
DETERMINISTIC
AS $$
class MyExplode:
def eval(self, arr):
if arr is None:
return
for element in arr:
yield (element,)
$$;
Use the function directly in a SQL query:
SELECT element FROM my_explode(array('apple', 'banana', 'cherry'));
+---------+
|| element |
+---------+
|| apple |
|| banana |
|| cherry |
+---------+
Or apply it to existing table data with a LATERAL
join:
SELECT s.*, e.element
FROM my_items AS s,
LATERAL my_explode(s.items) AS e;
Example: IP address geolocation via REST API
This example demonstrates how UDTFs can integrate external APIs directly into your SQL workflow. Instead of requiring separate ETL processes, analysts can enrich data with real-time API calls using familiar SQL syntax.
CREATE OR REPLACE FUNCTION ip_to_location(ip_address STRING)
RETURNS TABLE (city STRING, country STRING)
LANGUAGE PYTHON
HANDLER 'IPToLocationAPI'
AS $$
class IPToLocationAPI:
def eval(self, ip_address):
import requests
api_url = f"https://api.ip-lookup.example.com/{ip_address}"
try:
response = requests.get(api_url)
response.raise_for_status()
data = response.json()
yield (data.get('city'), data.get('country'))
except requests.exceptions.RequestException as e:
# Returns nothing if the ip_address is invalid
return
$$;
Note
Python UDTFs allow TCP/UDP network traffic over ports 80, 443, and 53 using compute configured with standard access mode.
Use the function to enrich web log data with geographic information:
SELECT
l.timestamp,
l.request_path,
geo.city,
geo.country
FROM web_logs AS l,
LATERAL ip_to_location(l.ip_address) AS geo;
This approach enables real-time geographic analysis without requiring pre-processed lookup tables or separate data pipelines. The UDTF handles HTTP requests, JSON parsing, and error handling, making external data sources accessible through standard SQL queries.
Set DETERMINISTIC
if your function produces consistent results
Add DETERMINISTIC
to your function definition if it produces the same outputs for the same inputs. This allows query optimizations to improve performance.
By default, Batch Unity Catalog Python UDTFs are assumed to be non-deterministioc unless explicitly declared. Examples of non-deterministic functions include: generating random values, accessing current times or dates, or making external API calls.
See CREATE FUNCTION (SQL and Python)
Limitations
The following limitations apply to Unity Catalog Python UDTFs:
- Polymorphic table functions are not supported.
- The TABLE argument is not supported.
- Unity Catalog service credentials are not supported.
- Custom dependencies are not supported.