Databricks Utilities with Databricks Connect for Python

Note

This article covers Databricks Connect for Databricks Runtime 13.3 LTS and above.

This article describes how to use Databricks Utilities with Databricks Connect for Python. Databricks Connect enables you to connect popular IDEs, notebook servers, and custom applications to Azure Databricks clusters. See What is Databricks Connect?. For the Scala version of this article, see Databricks Utilities with Databricks Connect for Scala.

Note

Before you begin to use Databricks Connect, you must set up the Databricks Connect client.

You use Databricks Connect to access Databricks Utilities as follows:

  • Use the WorkspaceClient class's dbutils variable to access Databricks Utilities. The WorkspaceClient class belongs to the Databricks SDK for Python and is included in Databricks Connect.
  • Use dbutils.fs to access the Databricks Utilities fs utility.
  • Use dbutils.secrets to access the Databricks Utilities secrets utility.
  • No Databricks Utilities functionality other than the preceding utilities are available through dbutils.

Tip

You can also use the included Databricks SDK for Python to access any available Databricks REST API, not just the preceding Databricks Utilities APIs. See databricks-sdk on PyPI.

To initialize WorkspaceClient, you must provide enough information to authenticate an Databricks SDK with the workspace. For example, you can:

  • Hard-code the workspace URL and your access token directly within your code, and then initialize WorkspaceClient as follows. Although this option is supported, Databricks does not recommend this option, as it can expose sensitive information, such as access tokens, if your code is checked into version control or otherwise shared:

    from databricks.sdk import WorkspaceClient
    
    w = WorkspaceClient(host  = f"https://{retrieve_workspace_instance_name()}",
                        token = retrieve_token())
    
  • Create or specify a configuration profile that contains the fields host and token, and then intialize the WorkspaceClient as follows:

    from databricks.sdk import WorkspaceClient
    
    w = WorkspaceClient(profile = "<profile-name>")
    
  • Set the environment variables DATABRICKS_HOST and DATABRICKS_TOKEN in the same way you set them for Databricks Connect, and then initialize WorkspaceClient as follows:

    from databricks.sdk import WorkspaceClient
    
    w = WorkspaceClient()
    

The Databricks SDK for Python does not recognize the SPARK_REMOTE environment variable for Databricks Connect.

For additional Azure Databricks authentication options for the Databricks SDK for Python, as well as how to initialize AccountClient within the Databricks SDKs to access available Databricks REST APIs at the account level instead of at the workspace level, see databricks-sdk on PyPI.

The following example shows how to use the Databricks SDK for Python to automate Databricks Utilities. This example creates a file named zzz_hello.txt in a Unity Catalog volume's path within the workspace, reads the data from the file, and then deletes the file. This example assumes that the environment variables DATABRICKS_HOST and DATABRICKS_TOKEN have already been set:

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

file_path = "/Volumes/main/default/my-volume/zzz_hello.txt"
file_data = "Hello, Databricks!"
fs = w.dbutils.fs

fs.put(
  file      = file_path,
  contents  = file_data,
  overwrite = True
)

print(fs.head(file_path))

fs.rm(file_path)

See also Interaction with dbutils in the Databricks SDK for Python documentation.