Using Unity Catalog with Structured Streaming
Use Structured Streaming with Unity Catalog to manage data governance for your incremental and streaming workloads on Azure Databricks. This document outlines supported functionality and suggests best practices for using Unity Catalog and Structured Streaming together.
What Structured Streaming functionality does Unity Catalog support?
Unity Catalog does not add any explicit limits for Structured Streaming sources and sinks available on Azure Databricks. The Unity Catalog data governance model allows you to stream data from managed and external tables in Unity Catalog. You can also use external locations managed by Unity Catalog to interact with data using object storage URIs. You can write to external tables using either table names or file paths. You must interact with managed tables on Unity Catalog using the table name.
Use external locations managed by Unity Catalog when specifying paths for Structured Streaming checkpoints. To learn more about securely connecting storage with Unity Catalog, see Connect to cloud object storage using Unity Catalog.
Structured streaming feature support differs depending on the Databricks Runtime version you are running and whether you are using assigned or shared cluster access mode. For details, see Streaming limitations for Unity Catalog.
For an end-to-end demo using Structured Streaming on Unity Catalog, see Tutorial: Run an end-to-end lakehouse analytics pipeline.
What Structured Streaming functionality is not supported on Unity Catalog?
For a list of Structured Streaming features that are not supported on Unity Catalog, see Streaming limitations for Unity Catalog.