cloud_files_state
table-valued function
Applies to: Databricks SQL Databricks Runtime 11.3 LTS and above
Returns the file-level state of an Auto Loader or read_files
stream.
Syntax
cloud_files_state( { TABLE ( table_name ) | checkpoint } )
Arguments
- table_name: The identifier of the streaming table that's being written to by
read_files
. The name must not include a temporal specification. Available in Databricks Runtime 13.3 LTS and above. checkpoint
: ASTRING
literal. The checkpoint directory for a stream using the Auto Loader source. See What is Auto Loader?.
Returns
Returns a table with the following schema:
path STRING NOT NULL PRIMARY KEY
The path of a file.
size BIGINT NOT NULL
The size of a file in bytes.
create_time TIMESTAMP NOT NULL
The time that a file was created.
discovery_time TIMESTAMP NOT NULL
Important
This feature is in Private Preview. To try it, reach out to your Azure Databricks contact.
The time that a file was discovered.
commit_time TIMESTAMP
Important
This feature is in Private Preview. To try it, reach out to your Azure Databricks contact.
The time that a file was committed to the checkpoint after processing.
NULL
if the file is not yet processed. A file might be processed, but might be marked as committed arbitrarily later. Marking the file as committed means that Auto Loader does not require the file for processing again.archive_time TIMESTAMP
Important
This feature is in Private Preview. To try it, reach out to your Azure Databricks contact.
The time that a file was archived.
NULL
if the file has not been archived.source_id STRING
The ID of the Auto Loader source in the streaming query. This value is
'0'
for streams that ingest from a single cloud object store location.
Permissions
You need to have:
OWNER
privileges on the streaming table if using a streaming table identifier.READ FILES
privileges on the checkpoint location if providing a checkpoint under an external location.
Examples
-- Simple example from checkpoint
> SELECT path FROM CLOUD_FILES_STATE('/some/checkpoint');
/some/input/path
/other/input/path
-- Simple example from source subdir
> SELECT path FROM CLOUD_FILES_STATE('/some/checkpoint/sources/0');
/some/input/path
/other/input/path
-- Simple example from streaming table
> SELECT path FROM CLOUD_FILES_STATE(TABLE(my_streaming_table));
/some/input/path
/other/input/path