What are workspace files?
A workspace file is a file in your Azure Databricks workspace file tree that is not one of the types listed as follows:
- Notebooks
- Queries
- Dashboards
- Genie spaces
- Experiments
Other than these excluded types, workspace files can be any file type. Common examples include:
.py
files used in custom modules..md
files, such asREADME.md
..csv
or other small data files..txt
files..whl
libraries.- Log files.
For recommendations on working with files, see Recommendations for files in volumes and workspace files.
Your Azure Databricks workspace file tree can contain folders attached to a Git repository called "Databricks Git folders". They have some additional limitations in file type support. For a list of file types supported in Git folders (formerly "Repos"), see Manage file assets in Databricks Git folders.
Important
Workspace files are enabled everywhere by default in Databricks Runtime version 11.2. For production workloads, use Databricks Runtime 11.3 LTS or above. Contact your workspace administrator if you cannot access this functionality.
What you can do with workspace files
Azure Databricks provides functionality similar to local development for many workspace file types, including a built-in file editor. Not all use cases for all file types are supported.
You can create, edit, and manage access to workspace files using familiar patterns from notebook interactions. You can use relative paths for library imports from workspace files, similar to local development. For more details, see:
- Workspace files basic usage
- Programmatically interact with workspace files
- Work with Python and R modules
- Display images
- Manage notebooks
- File ACLs
Init scripts stored in workspace files have special behavior. You can use workspace files to store and reference init scripts in any Databricks Runtime versions. See Store init scripts in workspace files.
Note
In Databricks Runtime 14.0 and above, the the default current working directory (CWD) for code executed locally is the directory containing the notebook or script being run. This is a change in behavior from Databricks Runtime 13.3 LTS and below. See What is the default current working directory?.
Limitations
A complete list of workspace files limitations is found in Workspace files limitations.
File size limit
Individual workspace files are limited to 500 MB.
Users can upload file sizes up to 500 MB from the UI. The maximum file size allowed when writing from a cluster is 256 MB.
Databricks Runtime versions for files in Git folders with a cluster with Azure Databricks Container Services
On clusters running Databricks Runtime 11.3 LTS and above, the default settings allow you to use workspace files in Git folders with Azure Databricks Container Services (DCS).
On clusters running Databricks Runtime versions 10.4 LTS and 9.1 LTS, you must configure the dockerfile to access workspace files in Git folders on a cluster with DCS. Refer to the following dockerfiles for the desired Databricks Runtime version:
See Customize containers with Databricks Container Service
Enable workspace files
To enable support for non-notebook files in your Databricks workspace, call the /api/2.0/workspace-conf REST API from a notebook or other environment with access to your Databricks workspace. Workspace files are enabled by default.
To enable or re-enable support for non-notebook files in your Databricks workspace, call the /api/2.0/workspace-conf
and get the value of the enableWorkspaceFileSystem
key. If it is set to true
, non-notebook files are already enabled for your workspace.
The following example demonstrates how you can call this API from a notebook to check if workspace files are disabled and if so, re-enable them.