排查对 Azure Data Lake Storage Gen2 进行 JDBC/ODBC 访问时出现的问题Troubleshooting JDBC/ODBC access to Azure Data Lake Storage Gen2

问题Problem

备注

通常,如果要访问 Azure Data Lake Storage Gen2 (ADLS Gen2),应使用包含内置 Azure Blob File System (ABFS) 驱动程序的 Databricks Runtime 5.2 及更高版本。In general, you should use Databricks Runtime 5.2 and above, which include a built-in Azure Blob File System (ABFS) driver, when you want to access Azure Data Lake Storage Gen2 (ADLS Gen2). 本文适用于使用 JDBC/ODBC 访问 ADLS Gen2 存储的用户。This article applies to users who are accessing ADLS Gen2 storage using JDBC/ODBC instead.

从 JDBC 或 ODBC 客户端运行 SQL 查询以访问 ADLS Gen2 时,将发生以下错误:When you run a SQL query from a JDBC or ODBC client to access ADLS Gen2, the following error occurs:

com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalArgumentException: No value for dfs.adls.oauth2.access.token.provider found in conf file.

18/10/23 21:03:28 ERROR SparkExecuteStatementOperation: Error executing query, currentState RUNNING,
java.util.concurrent.ExecutionException: java.io.IOException: There is no primary group for UGI (Basic token)chris.stevens+dbadmin (auth:SIMPLE)
  at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
  at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
  at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
  at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
  at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2344)
  at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2316)
  at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2278)
  at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2193)
  at com.google.common.cache.LocalCache.get(LocalCache.java:3932)
  at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4721)
  at org.apache.spark.sql.catalyst.catalog.SessionCatalog.getCachedPlan(SessionCatalog.scala:158)
  at org.apache.spark.sql.execution.datasources.FindDataSourceTable.org$apache$spark$sql$execution$datasources$FindDataSourceTable$$readDataSourceTable(DataSourceStrategy.scala:257)
  at org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:313)
  at
  at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
  at scala.collection.immutable.List.foldLeft(List.scala:84)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:87)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:79)

从 SQL 客户端运行查询时,你将收到以下错误:When you run the query from the SQL client, you get the following error:

An error occurred when executing the SQL command:
select * from test_databricks limit 50

[Simba][SparkJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: com.google.common.util.concurrent.UncheckedExecutionException: com.databricks.backend.daemon.data.common.InvalidMountException: Error while using path /mnt/crm_gen2/phonecalls for resolving path '/phonecalls' within mount at '/mnt/crm_gen2'., Query: SELECT * FROM `default`.`test_databricks` `default_test_databricks` LIMIT 50. [SQL State=HY000, DB Errorcode=500051]

Warnings:
[Simba][SparkJDBCDriver](500100) Error getting table information from database.

原因Cause

根本原因是配置设置不正确,无法通过 ADLS Gen2 创建 JDBC 或 ODBC 与 ABFS 的连接,从而导致查询失败。The root cause is incorrect configuration settings to create a JDBC or ODBC connection to ABFS via ADLS Gen2, which cause queries to fail.

解决方案Solution

在群集配置设置中将 spark.hadoop.hive.server2.enable.doAs 设置为 falseSet spark.hadoop.hive.server2.enable.doAs to false in the cluster configuration settings.