Use a secret in a Spark configuration property or environment variable

This article provides details about how to reference a secret in a Spark configuration property or environment variable. Retrieved secrets are redacted from notebook output and Spark driver and executor logs.

Important

This feature is in Public Preview.

Security considerations

Databricks does not recommend storing secrets in cluster environment variables if they must not be available to all users on the cluster. Keep the following security implications in mind when referencing secrets in a Spark configuration property or environment variable:

  • Any user with CAN ATTACH TO permissions on a cluster or Run permissions on a notebook can read cluster environment variables from within the notebook.

  • If table access control is not enabled on a cluster, any user with CAN ATTACH TO permissions on a cluster or Run permissions on a notebook can read Spark configuration properties from within the notebook. This includes users who do not have direct permission to read a secret.

  • Secrets are not redacted from the Spark driver log stdout and stderr streams. To protect sensitive data, by default, Spark driver logs are viewable only by users with CAN MANAGE permission on job, single user access mode, and shared access mode clusters.

    On No Isolation Shared access mode clusters, the Spark driver logs can be viewed by users with CAN ATTACH TO or CAN MANAGE permission. To limit who can read the logs to only users with the CAN MANAGE permission, set spark.databricks.acl.needAdminPermissionToViewLogs to true.

Requirements

The following requirements apply to referencing secrets in Spark configuration properties and environment variables:

  • Cluster owners must have CAN READ permission on the secret scope.
  • You must be a cluster owner to add or edit a secret in a Spark configuration property or environment variable.
  • If a secret is updated, you must restart your cluster to fetch the secret again.
  • You must have the CAN MANAGE permission on the cluster to delete a secret Spark configuration property or environment variable.

Reference a secret with a Spark configuration property

You specify a reference to a secret in a Spark configuration property in the following format:

spark.<property-name> {{secrets/<scope-name>/<secret-name>}}

Replace:

  • <scope-name> with the name of the secret scope.
  • <secret-name> with the unique name of the secret in the scope.
  • <property-name> with the Spark configuration property

Each Spark configuration property can only reference one secret, but you can configure multiple Spark properties to reference secrets.

For example:

spark.password {{secrets/scope1/key1}}

To fetch the secret in the notebook and use it:

Python

spark.conf.get("spark.password")

SQL

SELECT ${spark.password};

Reference a secret in an environment variable

You specify a secret path in an environment variable in the following format:

<variable-name>={{secrets/<scope-name>/<secret-name>}}

You can use any valid variable name when you reference a secret. Access to secrets referenced in environment variables is determined by the permissions of the user who configured the cluster. Although secrets stored in environment variables are accessible to all cluster users, they are redacted from plaintext display, similar to other secret references.

Environment variables that reference secrets are accessible from a cluster-scoped init script. See Set and use environment variables with init scripts.

For example:

You set an environment variable to reference a secret:

SPARKPASSWORD={{secrets/scope1/key1}}

To fetch the secret in an init script, access $SPARKPASSWORD using the following pattern:

if [ -n "$SPARKPASSWORD" ]; then
  # code to use ${SPARKPASSWORD}
fi