Python user-defined table functions (UDTFs) in Unity Catalog

Important

Registering Python UDTFs in Unity Catalog is in Public Preview.

A Unity Catalog user-defined table function (UDTF) allows you to register functions that return complete tables instead of scalar values. Unlike scalar functions that return a single result value from each call, UDTFs are invoked in a SQL statement's FROM clause and can return multiple rows and columns.

UDTFs are particularly useful for:

  • Transforming arrays or complex data structures into multiple rows
  • Integrating external APIs or services into SQL workflows
  • Implementing custom data generation or enrichment logic
  • Processing data that requires stateful operations across rows

Each UDTF call can accept zero or more arguments. These arguments can be scalar expressions or table arguments representing entire input tables.

UDTFs can be registered in two ways:

  • Unity Catalog: Register the UDTF as a governed object in Unity Catalog.
  • Session-scoped: Register to the local SparkSession, isolated to the current notebook or job. See Python user-defined table functions (UDTFs).

Requirements

Unity Catalog Python UDTFs are supported on the following compute types:

  • Classic compute with standard access mode (Databricks Runtime 17.1 and above)
  • SQL warehouse (pro, and classic)

Create a UDTF in Unity Catalog

Use SQL DDL to create a governed UDTF in Unity Catalog. UDTFs are invoked using a SQL statement's FROM clause.

CREATE OR REPLACE FUNCTION square_numbers(start INT, end INT)
RETURNS TABLE (num INT, squared INT)
LANGUAGE PYTHON
HANDLER 'SquareNumbers'
DETERMINISTIC
AS $$
class SquareNumbers:
    """
    Basic UDTF that computes a sequence of integers
    and includes the square of each number in the range.
    """
    def eval(self, start: int, end: int):
        for num in range(start, end + 1):
            yield (num, num * num)
$$;

SELECT * FROM square_numbers(1, 5);

+-----+---------+
| num | squared |
+-----+---------+
| 1   | 1       |
| 2   | 4       |
| 3   | 9       |
| 4   | 16      |
| 5   | 25      |
+-----+---------+

Databricks implements Python UDTFs as Python classes with a mandatory eval method that yields output rows.

Practical examples

The following examples demonstrate real-world use cases for Unity Catalog Python UDTFs, progressing from simple data transformations to complex external integrations.

Example: Re-implementing explode

While Spark provides a built-in explode function, creating your own version demonstrates the fundamental UDTF pattern of taking a single input and producing multiple output rows.

CREATE OR REPLACE FUNCTION my_explode(arr ARRAY<STRING>)
RETURNS TABLE (element STRING)
LANGUAGE PYTHON
HANDLER 'MyExplode'
DETERMINISTIC
AS $$
class MyExplode:
    def eval(self, arr):
        if arr is None:
            return
        for element in arr:
            yield (element,)
$$;

Use the function directly in a SQL query:

SELECT element FROM my_explode(array('apple', 'banana', 'cherry'));
+---------+
|| element |
+---------+
|| apple   |
|| banana  |
|| cherry  |
+---------+

Or apply it to existing table data with a LATERAL join:

SELECT s.*, e.element
FROM my_items AS s,
LATERAL my_explode(s.items) AS e;

Example: IP address geolocation via REST API

This example demonstrates how UDTFs can integrate external APIs directly into your SQL workflow. Instead of requiring separate ETL processes, analysts can enrich data with real-time API calls using familiar SQL syntax.

CREATE OR REPLACE FUNCTION ip_to_location(ip_address STRING)
RETURNS TABLE (city STRING, country STRING)
LANGUAGE PYTHON
HANDLER 'IPToLocationAPI'
AS $$
class IPToLocationAPI:
    def eval(self, ip_address):
        import requests
        api_url = f"https://api.ip-lookup.example.com/{ip_address}"
        try:
            response = requests.get(api_url)
            response.raise_for_status()
            data = response.json()
            yield (data.get('city'), data.get('country'))
        except requests.exceptions.RequestException as e:
            # Returns nothing if the ip_address is invalid
            return
$$;

Note

Python UDTFs allow TCP/UDP network traffic over ports 80, 443, and 53 using compute configured with standard access mode.

Use the function to enrich web log data with geographic information:

SELECT
  l.timestamp,
  l.request_path,
  geo.city,
  geo.country
FROM web_logs AS l,
LATERAL ip_to_location(l.ip_address) AS geo;

This approach enables real-time geographic analysis without requiring pre-processed lookup tables or separate data pipelines. The UDTF handles HTTP requests, JSON parsing, and error handling, making external data sources accessible through standard SQL queries.

Set DETERMINISTIC if your function produces consistent results

Add DETERMINISTIC to your function definition if it produces the same outputs for the same inputs. This allows query optimizations to improve performance.

By default, Batch Unity Catalog Python UDTFs are assumed to be non-deterministioc unless explicitly declared. Examples of non-deterministic functions include: generating random values, accessing current times or dates, or making external API calls.

See CREATE FUNCTION (SQL and Python)

Limitations

The following limitations apply to Unity Catalog Python UDTFs:

Next steps