What is the default current working directory?

This article describes how the default current working directory (CWD) works for notebook and file execution.

Note

Use Databricks Runtime 14.0+ and default workspace configs for more consistency in (CWD) behavior throughout the workspace.

There are two default CWD behaviors for code executed locally in notebooks and files:

  1. CWD returns the directory containing the notebook or script being run.
  2. CWD returns a directory representing the ephemeral storage volume attached to the driver.

This CWD behavior affects all code, including %sh and Python or R code that doesn't use Apache Spark. The behavior is determined by code language, Databricks Runtime version, workspace path, and workspace admin configuration.

For Scala code, the CWD is the ephemeral storage attached to the driver.

For code in all other languages:

  • In Databricks Runtime 14.0 and above, the CWD is the directory containing the notebook or script being run. This is true regardless of whether the code is in /Workspace/Repos.
  • For notebooks running Databricks Runtime 13.3 LTS and below, the CWD depends on whether the code is in /Workspace/Repos:
  • For code executed in a path outside of /Workspace/Repos, the CWD is the ephemeral storage volume attached to the driver
  • For code executed in a path in /Workspace/Repos, the CWD depends on your admin config setting and cluster DBR version:
    • For workspaces with enableWorkspaceFilesystem set to dbr8.4+ or true, on DBR versions 8.4 and above, the CWD is the directory containing the notebook or script being run. On DBR versions below 8.4, it is the ephemeral storage volume attached to the driver
    • For workspaces with enableWorkspaceFilesystem set to dbr11.0+, on DBR versions 11.0 and above, the CWD is the directory containing the notebook or script being run. On DBR versions below 11.0, it is the ephemeral storage volume attached to the driver
    • For workspaces with enableWorkspaceFilesystem set to false, the CWD is the ephemeral storage volume attached to the driver

Get the CWD in your code

To get the workspace CWD for your pipeline notebook, call os.getcwd(). You must import the os module (the default Python file system interaction module) at the beginning of your notebook with import os. For example:

import os
...
cwd = os.getcwd()

You can also set the CWD by calling os.chdir('/path/to/dir') at the beginning of your pipeline notebook. You can only set the CWD when you are running a notebook from your workspace with WSFS enabled.

How does this impact workloads?

The biggest impacts to workloads have to do with file persistance and location:

  • In Databricks Runtime 13.3 LTS and below, for code executed in a path outside of /Workspace/Repos, many code snippets store data to a default location on an ephemeral storage volume that is permanently deleted when the cluster is terminated.
  • In Databricks Runtime 14.0 and above, the default behavior for these operations creates workspace files stored alongside the running notebook that persist until explicitly deleted.

For notes on performance differences and other limitations inherent in workspace files, see Work with workspace files.

Revert to legacy behavior

You can change the current working directory for any notebook using the Python method os.chdir(). If you want to ensure that each notebook uses a CWD on the ephemeral storage volumes attached to the driver, you can add the following command to the first cell of each notebook and run it before any other code:

import os

os.chdir("/tmp")