使用 SAS 令牌提供程序直接访问 Azure Data Lake Storage Gen2 Access Azure Data Lake Storage Gen2 directly using a SAS token provider

你可以使用存储共享访问签名 (SAS) 直接访问 Azure Data Lake Storage Gen2 存储帐户。You can use storage shared access signatures (SAS) to access an Azure Data Lake Storage Gen2 storage account directly. 使用 SAS,你可以通过具有细粒度访问控制的临时令牌来限制对存储帐户的访问。With SAS, you can restrict access to a storage account using temporary tokens with fine-grained access control.

你可以添加多个存储帐户,并在同一 Spark 会话中配置各自的 SAS 令牌提供程序。You can add multiple storage accounts and configure respective SAS token providers in the same Spark session.

重要

Databricks Runtime 7.5 及更高版本中提供了 SAS 支持。SAS support is available in Databricks Runtime 7.5 and above. 这是适用于高级用户的试验性功能。This is an Experimental feature for advanced users.

实现 SAS 令牌提供程序 Implement a SAS token provider

若要使用 SAS 访问 Azure Data Lake Storage Gen2,必须提供 SASTokenProvider 接口的 Java 或 Scala 实现,这是 ABFS 提供的扩展点之一。To use SAS to access Azure Data Lake Storage Gen2, you must provide a Java or Scala implementation of the SASTokenProvider interface, one of the extension points offered by ABFS. 有关扩展点的详细信息,请参阅 Hadoop Azure 文档的扩展性部分。For more information on the extension points, see the Extensibility section of the Hadoop Azure documentation.

此接口具有以下方法:The interface has the following methods:

package org.apache.hadoop.fs.azurebfs.extensions;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.security.AccessControlException;

public interface SASTokenProvider {
  /**
   * Initialize authorizer for Azure Blob File System.
   * @param configuration Configuration object.
   * @param accountName Account Name.
   * @throws IOException network problems or similar.
   */
  void initialize(Configuration configuration, String accountName)
      throws IOException;

  /**
   * Invokes the authorizer to obtain a SAS token.
   *
   * @param account the name of the storage account.
   * @param fileSystem the name of the fileSystem.
   * @param path the file or directory path.
   * @param operation the operation to be performed on the path.
   * @return a SAS token to perform the request operation.
   * @throws IOException if there is a network error.
   * @throws AccessControlException if access is denied.
   */
  String getSASToken(String account, String fileSystem, String path, String operation)
      throws IOException, AccessControlException;
}

有关 SASTokenProvider 接口的示例实现,请参阅 Apache Hadoop 存储库中的 MockSASTokenProvider 类。For an example implementation of the SASTokenProvider interface see the MockSASTokenProvider class in the Apache Hadoop repository.

实现 SASTokenProvider 接口的类必须在运行时可用。The class that implements the SASTokenProvider interface must be available at runtime. 为此,可以直接在笔记本中将该实现作为包单元格提供,也可以附加包含该类的 JARYou can do that by directly providing the implementation in a notebook as a package cell or attaching a JAR containing the class.

配置 SAS 令牌提供程序Configure the SAS token provider

可以使用以下 Spark 配置注册该实现:You can register the implementation using the following Spark configs:

spark.hadoop.fs.azure.account.auth.type.<storage-account-name>.dfs.core.chinacloudapi.cn SAS
spark.hadoop.fs.azure.sas.token.provider.type.<storage-account-name>.dfs.core.chinacloudapi.cn <class-name>

其中,<class-name>SASTokenProvider 实现的完全限定的类名。where <class-name> is a fully-qualified class name of the SASTokenProvider implementation.

数据帧或数据集 APIDataFrame or DataSet API

如果你使用的是 Spark 数据帧或数据集 API,Databricks 建议你在笔记本的会话配置中设置 SAS 配置:If you are using Spark DataFrame or Dataset APIs, Databricks recommends that you set the SAS configuration in your notebook’s session configs:

spark.conf.set("fs.azure.account.auth.type.<storage-account-name>.dfs.core.chinacloudapi.cn", "SAS")
spark.conf.set("fs.azure.sas.token.provider.type.<storage-account-name>.dfs.core.chinacloudapi.cn", "<class-name>")