Explore storage and find data files
This article focuses on discovering and exploring directories and data files managed with Unity Catalog volumes, including UI-based instructions for exploring volumes with Catalog Explorer. This article also provides examples for programmatic exploration of data in cloud object storage using volume paths and cloud URIs.
Databricks recommends using volumes to manage access to data in cloud object storage. For more information on connecting to data in cloud object storage, see Connect to data sources.
For a full walkthrough of how to interact with files in all locations, see Work with files on Azure Databricks.
Important
When searching for Files in the workspace UI, you might discover data files stored as workspace files. Databricks recommends using workspace files primarily for code (such as scripts and libraries), init scripts, or configuration files. You should ideally limit data stored as workspace files to small datasets that might be used for tasks such as testing during development and QA. See What are workspace files?.
Volumes vs. legacy cloud object configurations
When you use volumes to manage access to data in cloud object storage, you can only use the volumes path to access data, and these paths are available with all Unity Catalog-enabled compute. You cannot register data files backing Unity Catalog tables using volumes. Databricks recommends using table names instead of file paths to interact with structured data registered as Unity Catalog tables. See How do paths work for data managed by Unity Catalog?.
If you use a legacy method for configuring access to data in cloud object storage, Azure Databricks reverts to legacy table ACLs permissions. Users wishing to access data using cloud URIs from SQL warehouses or compute configured with shared access mode require the ANY FILE
permission. See Hive metastore table access control (legacy).
Azure Databricks provides several APIs for listing files in cloud object storage. Most examples in this article focus on using volumes. For examples on interacting with data on object storage configured without volumes, see List files with URIs.
Explore volumes
You can use Catalog Explorer to explore data in volumes and review the details of a volume. You are only able to see volumes that you have permissions to read, so you can query all data discovered this way.
You can use SQL to explore volumes and their metadata. To list files in volumes, you can use SQL, the %fs
magic command, or Databricks utilities. When interacting with data in volumes, you use the path provided by Unity Catalog, which always has the following format:
/Volumes/catalog_name/schema_name/volume_name/path/to/data
Display volumes
SQL
Run the following command to see a list of volumes in a given schema.
SHOW VOLUMES IN catalog_name.schema_name;
See SHOW VOLUMES.
Catalog Explorer
To display volumes in a given schema with Catalog Explorer, do the following:
- Select the Catalog icon.
- Select a catalog.
- Select a schema.
- Click Volumes to expand all volumes in the schema.
Note
If no volumes are registered to a schema, the Volumes option is not displayed. Instead, you see a list of available tables.
See volume details
SQL
Run the following command to describe a volume.
DESCRIBE VOLUME volume_name
See DESCRIBE VOLUME.
Catalog Explorer
Click the volume name and select the Details tab to review volume details.
See files in volumes
SQL
Run the following command to list the files in a volume.
LIST '/Volumes/catalog_name/schema_name/volume_name/'
Catalog Explorer
Click the volume name and select the Details tab to review volume details.
%fs
Run the following command to list the files in a volume.
%fs ls /Volumes/catalog_name/schema_name/volume_name/
Databricks utilities
Run the following command to list the files in a volume.
dbutils.fs.ls("/Volumes/catalog_name/schema_name/volume_name/")
List files with URIs
You can query cloud object storage configured with methods other than volumes using URIs. You must be connected to compute with privileges to access the cloud location. The ANY FILE
permission is required on SQL warehouses and compute configured with shared access mode.
Note
URI access to object storage configured with volumes is not supported. You cannot use Catalog Explorer to review contents of object storage not configured with volumes.
The following examples include example URIs for data stored with Azure Data Lake Storage Gen2, and S3.
Sql
Run the following command to list files in cloud object storage.
-- ADLS 2
LIST 'abfss://container-name@storage-account-name.dfs.core.chinacloudapi.cn/path/to/data'
-- S3
LIST 's3://bucket-name/path/to/data'
%fs
Run the following command to list files in cloud object storage.
# ADLS 2
%fs ls abfss://container-name@storage-account-name.dfs.core.chinacloudapi.cn/path/to/data
# S3
%fs ls s3://bucket-name/path/to/data