Configure Auto Loader streams in file notification mode

This page describes how to configure Auto Loader streams to use file notification mode to incrementally discover and ingest cloud data.

In file notification mode, Auto Loader automatically sets up a notification service and queue service that subscribes to file events from the input directory. You can use file notifications to scale Auto Loader to ingest millions of files an hour. When compared to directory listing mode, file notification mode is more performant and scalable.

You can switch between file notifications and directory listing at any time and still maintain exactly-once data processing guarantees.

Note

File notification mode isn't supported for Azure premium storage accounts because premium accounts don't support queue storage.

Warning

Changing the source path for Auto Loader is not supported for file notification mode. If file notification mode is used and the path is changed, you might fail to ingest files that are already present in the new directory at the time of the directory update.

File notification mode with and without file events enabled on external locations

There are two ways to configure Auto Loader to use file notification mode:

(Recommended) File events: you use a single file notification queue for all streams that process files from a given external location.

This approach has the following advantages over the legacy file notification mode:
- Azure Databricks can set up subscriptions and file events in your cloud storage account for you without requiring that you supply additional credentials to Auto Loader using a service credential or other cloud-specific authentication options. See (Recommended) Enable file events for an external location.
- You have fewer Azure managed identity policies to create in your cloud storage account.
- Because you no longer need to create a queue for each Auto Loader stream, it's easier to avoid hitting the cloud provider notification limits listed in Cloud resources used in legacy Auto Loader file notification mode.
- Azure Databricks automatically manages the tuning of resource requirements, so you don't need to tune parameters such as cloudFiles.fetchParallelism.
- Cleanup functionality means that you don't need to worry as much about the lifecycle of notifications that are created in the cloud, such as when a stream is deleted or fully refreshed.

If you use Auto Loader in directory listing mode, Databricks recommends migrating to file notification mode with file events. Auto Loader with file events offers significant performance improvements. Start by enabling file events for your external location, then set cloudFiles.useManagedFileEvents in your Auto Loader stream configuration.

Legacy file notification mode: You manage file notification queues for each Auto Loader stream separately. Auto Loader automatically sets up a notification service and queue service that subscribes to file events from the input directory.

This is the legacy approach.

Use file notification mode with file events

This section describes how to create and updated Auto Loader streams to use file events.

Before you begin

Setting up file events requires:

An Azure Databricks workspace that is enabled for Unity Catalog.
Permission to create storage credential and external location objects in Unity Catalog.

Auto Loader streams with file events require:

Compute on Databricks Runtime 14.3 LTS or above.

Configuration instructions

The following instructions apply whether you are creating new Auto Loader streams or migrating existing streams to use the upgraded file notification mode with file events:

Create a storage credential and external location in Unity Catalog that grant access to the source location in cloud storage for your Auto Loader streams.
Enable file events for the external location. See (Recommended) Enable file events for an external location.
When you create a new Auto Loader stream or edit an existing one to work with the external location:
- If you have existing notifications-based Auto Loader streams that consume data from the external location, switch them off and delete the associated notification resources.
- Ensure that pathRewrites is not set (this is not a common option).
- Review the list of settings that Auto Loader ignores when it manages file notifications using file events. Avoid them in new Auto Loader streams and remove them from existing streams that you are migrating to this mode.
- Set the option cloudFiles.useManagedFileEvents to true in your Auto Loader code.

For example:

autoLoaderStream = (spark.readStream
  .format("cloudFiles")
  ...
  .options("cloudFiles.useManagedFileEvents", True)
  ...)

If you're using Lakeflow Spark Declarative Pipelines and you already have a pipeline with a streaming table, update it to include the useManagedFileEvents option:

CREATE OR REFRESH STREAMING LIVE TABLE <table-name>
AS SELECT <select clause expressions>
  FROM STREAM read_files('abfss://path/to/external/location/or/volume',
                   format => '<format>',
                   useManagedFileEvents => 'True'
                   ...
                   );

Unsupported Auto Loader settings

The following Auto Loader settings are unsupported when streams use file events:

Setting	Change
`useIncremental`	You no longer need to decide between the efficiency of file notifications and the simplicity of directory listing. Auto Loader with file events comes in one mode.
`useNotifications`	There is only one queue and storage event subscription per external location.
`cloudFiles.fetchParallelism`	Auto Loader with file events does not offer a manual parallelism optimization.
`cloudFiles.backfillInterval`	Azure Databricks handles backfill automatically for external locations that are enabled for file events.
`cloudFiles.pathRewrites`	This option applies only when you mount external data locations to the DBFS, which is deprecated.
`resourceTags`	You should set resource tags using the cloud console.

Limitations on Auto Loader with file events

The file events service optimizes file discovery by caching the most recently created files. If Auto Loader runs infrequently, this cache can expire, and Auto Loader falls back to directory listing to discover files and update the cache. To avoid this scenario, invoke Auto Loader at least once every seven days.

For a general list of limitations on file events, see File events limitations.

Manage file notification queues for each Auto Loader stream separately (legacy)

Important

You need elevated permissions to automatically configure cloud infrastructure for file notification mode. Contact your cloud administrator or workspace admin. See:

Required permissions for configuring file notification for Azure Data Lake Storage and Azure Blob Storage

Cloud resources used in legacy Auto Loader file notification mode

Auto Loader can set up file notifications for you automatically when you set the option cloudFiles.useNotifications to true and provide the necessary permissions to create cloud resources. In addition, you might need to provide additional options to grant Auto Loader authorization to create these resources.

The following table lists the resources that are created by Auto Loader for each cloud provider.

Cloud Storage	Subscription Service	Queue Service	Prefix *	Limit **
ADLS	Azure Event Grid	Azure Queue Storage	databricks	500 per storage account
Azure Blob Storage	Azure Event Grid	Azure Queue Storage	databricks	500 per storage account

* Auto Loader names the resources with this prefix.

** How many concurrent file notification pipelines can be launched

If you must run more file-notification-based Auto Loader streams than allowed by these limits, you can use file events or a service such as AWS Lambda, Azure Functions to fan out notifications from a single queue that listens to an entire container or bucket into directory-specific queues.

Legacy file notification events

Azure Data Lake Storage provides different event notifications for files that appear in your storage container.

Auto Loader listens for the FlushWithClose event for processing a file.
Auto Loader streams support the RenameFile action for discovering files. RenameFile actions require an API request to the storage system to get the size of the renamed file.
Auto Loader streams created with Databricks Runtime 9.0 and after support the RenameDirectory action for discovering files. RenameDirectory actions require API requests to the storage system to list the contents of the renamed directory.

Note

Cloud providers do not guarantee 100% delivery of all file events under very rare conditions and do not provide strict SLAs on the latency of the file events. Databricks recommends that you trigger regular backfills with Auto Loader by using the cloudFiles.backfillInterval option to guarantee that all files are discovered within a given SLA if data completeness is a requirement. Triggering regular backfills does not cause duplicates.

Required permissions for configuring file notification for Azure Data Lake Storage and Azure Blob Storage

You must have read permissions for the input directory. See Azure Blob Storage.

To use file notification mode, you must provide authentication credentials for setting up and accessing the event notification services.

You can authenticate using one of the following methods:

In Databricks Runtime 16.1 and above: Databricks service credential (recommended): Create a service credential using a managed identity and a Databricks access connector.
Service principal: Create a Microsoft Entra ID (formerly Azure Active Directory) app and service principal in the form of client ID and client secret.

After obtaining authentication credentials, assign the necessary permissions either to the Databricks access connector (for service credentials) or to the Microsoft Entra ID app (for a service principal).

Using Azure built-in roles

Assign the access connector the following roles to the storage account in which the input path resides:
- Contributor: This role is for setting up resources in your storage account, such as queues and event subscriptions.
- Storage Queue Data Contributor: This role is for performing queue operations such as retrieving and deleting messages from the queues. This role is required only when you provide a service principal without a connection string.
Assign this access connector the following role to the related resource group:
- EventGrid EventSubscription Contributor: This role is for performing Azure Event Grid (Event Grid) subscription operations such as creating or listing event subscriptions.
For more information, see Assign Azure roles using the Azure portal.

Using a custom role

If you are concerned with the excessive permissions required for the preceding roles, you can create a Custom Role with at least the following permissions, listed below in Azure role JSON format:

"permissions": [
  {
    "actions": [
      "Microsoft.EventGrid/eventSubscriptions/write",
      "Microsoft.EventGrid/eventSubscriptions/read",
      "Microsoft.EventGrid/eventSubscriptions/delete",
      "Microsoft.EventGrid/locations/eventSubscriptions/read",
      "Microsoft.Storage/storageAccounts/read",
      "Microsoft.Storage/storageAccounts/write",
      "Microsoft.Storage/storageAccounts/queueServices/read",
      "Microsoft.Storage/storageAccounts/queueServices/write",
      "Microsoft.Storage/storageAccounts/queueServices/queues/write",
      "Microsoft.Storage/storageAccounts/queueServices/queues/read",
      "Microsoft.Storage/storageAccounts/queueServices/queues/delete"
  ],
    "notActions": [],
    "dataActions": [
      "Microsoft.Storage/storageAccounts/queueServices/queues/messages/delete",
      "Microsoft.Storage/storageAccounts/queueServices/queues/messages/read",
      "Microsoft.Storage/storageAccounts/queueServices/queues/messages/write",
      "Microsoft.Storage/storageAccounts/queueServices/queues/messages/process/action"
    ],
    "notDataActions": []
  }
]

Then, you can assign this custom role to your access connector.

For more information, see Assign Azure roles using the Azure portal.

Manually configure or manage file notification resources

Privileged users can manually configure or manage file notification resources.

Set up the file notification services manually through the cloud provider and manually specify the queue identifier. See File notification options for more details.
Use Scala APIs to create or manage the notifications and queuing services, as shown in the following example:

Note

You must have appropriate permissions to configure or modify cloud infrastructure. See permissions documentation for Azure, S3, or GCS.

Python

# Databricks notebook source
# MAGIC %md ## Python bindings for CloudFiles Resource Managers for all 3 clouds

# COMMAND ----------

#######################################
## Creating a ResourceManager in Azure
#######################################

# Using a Databricks service credential
manager = spark._jvm.com.databricks.sql.CloudFilesAzureResourceManager \
  .newManager() \
  .option("cloudFiles.resourceGroup", <resource-group>) \
  .option("cloudFiles.subscriptionId", <subscription-id>) \
  .option("databricks.serviceCredential", <service-credential-name>) \
  .option("path", <path-to-specific-container-and-folder>) \
  .create()

# Using an Azure service principal
manager = spark._jvm.com.databricks.sql.CloudFilesAzureResourceManager \
  .newManager() \
  .option("cloudFiles.connectionString", <connection-string>) \
  .option("cloudFiles.resourceGroup", <resource-group>) \
  .option("cloudFiles.subscriptionId", <subscription-id>) \
  .option("cloudFiles.tenantId", <tenant-id>) \
  .option("cloudFiles.clientId", <service-principal-client-id>) \
  .option("cloudFiles.clientSecret", <service-principal-client-secret>) \
  .option("path", <path-to-specific-container-and-folder>) \
  .create()


# Set up a queue and a topic subscribed to the path provided in the manager.
manager.setUpNotificationServices(<resource-suffix>)

# List notification services created by <AL>
from pyspark.sql import DataFrame
df = DataFrame(manager.listNotificationServices(), spark)

# Tear down the notification services created for a specific stream ID.
# Stream ID is a GUID string that you can find in the list result above.
manager.tearDownNotificationServices(<stream-id>)

Scala


///////////////////////////////////////
// Creating a ResourceManager in Azure
///////////////////////////////////////

import com.databricks.sql.CloudFilesAzureResourceManager

/**
 * Using a Databricks service credential
 */
val manager = CloudFilesAzureResourceManager
  .newManager
  .option("cloudFiles.resourceGroup", <resource-group>)
  .option("cloudFiles.subscriptionId", <subscription-id>)
  .option("databricks.serviceCredential", <service-credential-name>)
  .option("path", <path-to-specific-container-and-folder>) // required only for setUpNotificationServices
  .create()

/**
 * Using an Azure service principal
 */
val manager = CloudFilesAzureResourceManager
  .newManager
  .option("cloudFiles.connectionString", <connection-string>)
  .option("cloudFiles.resourceGroup", <resource-group>)
  .option("cloudFiles.subscriptionId", <subscription-id>)
  .option("cloudFiles.tenantId", <tenant-id>)
  .option("cloudFiles.clientId", <service-principal-client-id>)
  .option("cloudFiles.clientSecret", <service-principal-client-secret>)
  .option("path", <path-to-specific-container-and-folder>) // required only for setUpNotificationServices
  .create()


// Set up a queue and a topic subscribed to the path provided in the manager.
manager.setUpNotificationServices(<resource-suffix>)

// List notification services created by <AL>
val df = manager.listNotificationServices()

// Tear down the notification services created for a specific stream ID.
// Stream ID is a GUID string that you can find in the list result above.
manager.tearDownNotificationServices(<stream-id>)

Use setUpNotificationServices(<resource-suffix>) to create a queue and a subscription with the name <prefix>-<resource-suffix> (the prefix depends on the storage system summarized in Cloud resources used in legacy Auto Loader file notification mode. If there is an existing resource with the same name, Azure Databricks reuses the existing resource instead of creating a new one. This function returns a queue identifier that you can pass to the cloudFiles source using the identifier in File notification options. This enables the cloudFiles source user to have fewer permissions than the user who creates the resources.

Provide the "path" option to newManager only if calling setUpNotificationServices; it is not needed for listNotificationServices or tearDownNotificationServices. This is the same path that you use when running a streaming query.

The following matrix indicates which API methods are supported in which Databricks Runtime for each type of storage:

Cloud Storage	Setup API	List API	Tear down API
ADLS	All versions	All versions	All versions
Azure Blob Storage	All versions	All versions	All versions

Clean up event notification resources created by Auto Loader

Auto Loader doesn't automatically tear down file notification resources. To tear down file notification resources, you must use the cloud resource manager as shown in the previous section. You can also delete these resources manually using the cloud provider's UI or APIs.

Troubleshoot common errors

This section describes common errors when using Auto Loader with file notification mode and how to resolve them.

Failed to create Event Grid subscription

If you see the following error message when you run Auto Loader for the first time, Event Grid is not registered as a Resource Provider in the Azure subscription.

java.lang.RuntimeException: Failed to create event grid subscription.

To register Event Grid as a resource provider, do the following:

In the Azure portal, go to your subscription.
Click Resource Providers under the Settings section.
Register the provider Microsoft.EventGrid.

Authorization required to perform Event Grid subscription operations

If you see the following error message when you run Auto Loader for the first time, confirm that the Contributor role is assigned to the service principal for Event Grid and the storage account.

403 Forbidden ... does not have authorization to perform action 'Microsoft.EventGrid/eventSubscriptions/[read|write]' over scope ...

Event Grid client bypasses proxy

In Databricks Runtime 15.2 and above, Event Grid connections in Auto Loader use proxy settings from system properties by default. In Databricks Runtime 13.3 LTS, 14.3 LTS, and 15.0 to 15.2, you can manually configure Event Grid connections to use a proxy by setting the Spark Config property spark.databricks.cloudFiles.eventGridClient.useSystemProperties true. See Set Spark configuration properties on Azure Databricks.

Last updated on 2026-01-26

Configure Auto Loader streams in file notification mode

File notification mode with and without file events enabled on external locations

Use file notification mode with file events

Before you begin

Configuration instructions

Unsupported Auto Loader settings

Limitations on Auto Loader with file events

Manage file notification queues for each Auto Loader stream separately (legacy)

Cloud resources used in legacy Auto Loader file notification mode

Legacy file notification events

Required permissions for configuring file notification for Azure Data Lake Storage and Azure Blob Storage

Manually configure or manage file notification resources

Python

Scala

Clean up event notification resources created by Auto Loader

Troubleshoot common errors

Failed to create Event Grid subscription

Authorization required to perform Event Grid subscription operations

Event Grid client bypasses proxy

Additional resources