加密群集工作器节点之间的流量Encrypt traffic between cluster worker nodes

备注

此功能并非适用于所有 Azure Databricks 订阅。This feature is not available for all Azure Databricks subscriptions. 请联系 Microsoft 或 Databricks 客户代表,以申请访问权限。Contact your Microsoft or Databricks account representative to request access.

在 Azure Databricks 的典型数据处理工作流中,通过加密的通道将用户查询或转换发送到群集。In a typical data processing workflow in Azure Databricks, a user query or transformation is sent to your clusters over an encrypted channel. 但是,群集工作器节点之间交换的数据默认情况下未加密。The data exchanged between cluster worker nodes, however, is not encrypted by default. 如果你的环境要求始终对数据进行加密,无论是静态加密还是传输中加密,都可以创建一个初始化脚本,该脚本将群集配置为通过 TLS 1.2 连接使用 AES 128 位加密来加密工作器节点之间的流量。If your environment requires that data be encrypted at all times, whether at rest or in transit, you can create an init script that configures your clusters to encrypt traffic between worker nodes, using AES 128-bit encryption over a TLS 1.2 connection.

备注

尽管 AES 使加密例程能够利用硬件加速,但是与未加密的流量相比,仍然存在性能损失。Although AES enables cryptographic routines to take advantage of hardware acceleration, there is nonetheless a performance penalty compared to unencrypted traffic. 根据随机数据的数量,节点之间的吞吐量可能会减少,导致已加密群集上的查询花费更长的时间。Depending on the amount of shuffle data, throughput between nodes can be decreased, resulting in queries taking longer on an encrypted cluster.

若要为工作器节点之间的流量启用加密,请创建一个用于设置 Spark 配置的群集范围初始化脚本全局初始化脚本(如果希望工作区中的所有群集都使用工作器间加密)。To enable encryption for traffic between worker nodes, create a cluster-scoped init script or global init script (if you want all clusters in your workspace to use worker-to-worker encryption) that sets the Spark configuration.

获取密钥存储文件和密码Get keystore file and password

将为每个工作区动态生成用于启用 SSL/HTTPS 的 JKS 密钥存储文件。The JKS keystore file used for enabling SSL/HTTPS is dynamically generated for each workspace. 该 JKS 密钥存储文件的密码是硬编码的,不用于保护密钥存储的机密性。The password of the JKS keystore file is hardcoded and not intended to protect the confidentiality of the keystore. 不要假定密钥存储文件本身受到保护。Do not assume that the keystore file itself is protected.

#!/bin/bash

keystore_file="$DB_HOME/keys/jetty_ssl_driver_keystore.jks"
keystore_password="gb1gQqZ9ZIHS"

# Use the SHA256 of the JKS keystore file as a SASL authentication secret string
sasl_secret=$(sha256sum $keystore_file | cut -d' ' -f1)

spark_defaults_conf="$DB_HOME/spark/conf/spark-defaults.conf"
driver_conf="$DB_HOME/driver/conf/config.conf"

if [ ! -e $spark_defaults_conf ] ; then
    touch $spark_defaults_conf
fi
if [ ! -e $driver_conf ] ; then
    touch $driver_conf
fi

设置执行程序配置Set the executor configuration

# Authenticate
echo "spark.authenticate true" >> $spark_defaults_conf
echo "spark.authenticate.secret $sasl_secret" >> $spark_defaults_conf

# Configure AES encryption
echo "spark.network.crypto.enabled true" >> $spark_defaults_conf
echo "spark.network.crypto.saslFallback false" >> $spark_defaults_conf

# Configure SSL
echo "spark.ssl.enabled true" >> $spark_defaults_conf
echo "spark.ssl.keyPassword $keystore_password" >> $spark_defaults_conf
echo "spark.ssl.keyStore $keystore_file" >> $spark_defaults_conf
echo "spark.ssl.keyStorePassword $keystore_password" >> $spark_defaults_conf
echo "spark.ssl.protocol TLSv1.2" >> $spark_defaults_conf
echo "spark.ssl.standalone.enabled true" >> $spark_defaults_conf
echo "spark.ssl.ui.enabled true" >> $spark_defaults_conf

设置驱动程序配置Set the driver configuration

head -n -a ${DB_HOME}/driver/conf/spark-branch.conf > $driver_conf

# Authenticate
echo "spark.authenticate true" >> $driver_conf
echo "spark.authenticate.secret $sasl_secret" >> $driver_conf

# Configure AES encryption
echo "spark.network.crypto.enabled true" >> $driver_conf
echo "spark.network.crypto.saslFallback false" >> $driver_conf

# Configure SSL
echo "spark.ssl.enabled true" >> $driver_conf
echo "spark.ssl.keyPassword $keystore_password" >> $driver_conf
echo "spark.ssl.keyStore $keystore_file" >> $driver_conf
echo "spark.ssl.keyStorePassword $keystore_password" >> $driver_conf
echo "spark.ssl.protocol TLSv1.2" >> $driver_conf
echo "spark.ssl.standalone.enabled true" >> $driver_conf
echo "spark.ssl.ui.enabled true" >> $driver_conf

mv $driver_conf ${DB_HOME}/driver/conf/spark-branch.conf

驱动程序和工作器节点的初始化完成后,这些节点之间的所有流量都将加密。Once the initialization of the driver and worker nodes is complete, all traffic between these nodes will be encrypted.