Volumes

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 13.3 LTS and above check marked yes Unity Catalog only

Volumes are Unity Catalog objects representing a logical volume of storage in a cloud object storage location. Volumes provide capabilities for accessing, storing, governing, and organizing files. While tables provide governance over tabular datasets, volumes add governance over non-tabular datasets. You can use volumes to store and access files in any format, including structured, semi-structured, and unstructured data.

Volumes are siblings to tables, views, and other objects organized under a schema in Unity Catalog.

A volume can be managed or external.

For more details and limitations, see What are Unity Catalog volumes?.

Managed volume

A managed volume is a Unity Catalog-governed storage volume created within the managed storage location of the containing schema. Managed volumes allow the creation of governed storage for working with files without the overhead of external locations and storage credentials. You do not need to specify a location when creating a managed volume, and all file access for data in managed volumes is through paths managed by Unity Catalog.

External volume

An external volume is a Unity Catalog-governed storage volume registered against a directory within an external location.

Volume naming and reference

A volume name is an identifier that can be qualified with a catalog and schema name in SQL commands.

The path to access files in volumes uses the following format:

/Volumes/<catalog_identifier>/<schema_identifier>/<volume_identifier>/<path>/<file_name>

Note that Azure Databricks normalizes the identifiers to lower case.

Azure Databricks also supports an optional dbfs:/ scheme, so the following path also works:

dbfs:/Volumes/<catalog_identifier>/<schema_identifier>/<volume_identifier>/<path>/<file_name>

Note

You can also access data in external volumes using cloud storage URIs.

Managing files in volumes

Applies to: check marked yes Databricks SQL Connector

Using a Databricks SQL Connector you can manage files in volumes using the following commands:

  • PUT INTO to copy a file from your local storage into a volume.
  • GET to copy a file from a volume to your local storage.
  • REMOVE to remove a file from a volume.

Examples

--- Create an external volume under the directory "my-path"
> CREATE EXTERNAL VOLUME IF NOT EXISTS myCatalog.mySchema.myExternalVolume
        COMMENT 'This is my example external volume'
        LOCATION 's3://my-bucket/my-location/my-path'
 OK

--- Set the current catalog
> USE CATALOG myCatalog;
 OK

--- Set the current schema
> USE SCHEMA mySchema;
 OK

--- Create a managed volume; it is not necessary to specify a location
> CREATE VOLUME myManagedVolume
    COMMENT 'This is my example managed volume';
 OK

--- List the files inside the volume, all names are lowercase
> LIST '/Volumes/mycatalog/myschema/myexternalvolume'
 sample.csv

> LIST 'dbfs:/Volumes/mycatalog/myschema/mymanagedvolume'
 sample.csv

--- Print the content of a csv file
> SELECT * FROM csv.`/Volumes/mycatalog/myschema/myexternalvolume/sample.csv`
 20

> SELECT * FROM csv.`dbfs:/Volumes/mycatalog/myschema/mymanagedvolume/sample.csv`
 20