从元存储中删除包含已损坏的元数据的表Drop tables with corrupted metadata from the metastore

本文介绍如何删除包含已损坏的元数据的表。This article explains how to drop tables with corrupted metadata.

问题Problem

有时无法从 Azure Databricks UI 中删除表。Sometimes you cannot drop a table from the Azure Databricks UI. 无法使用 %sqlspark.sql 来删除表。Using %sql or spark.sql to drop table doesn’t work either.

原因Cause

元存储中存储的元数据(表架构)已损坏。The metadata (table schema) stored in the metastore is corrupted. 运行 Drop table 命令时,Spark 会检查表是否存在,然后再删除表。When you run Drop table command, Spark checks whether table exists or not before dropping the table. 由于表的元数据已损坏,因此 Spark 无法删除表,操作将失败并出现以下异常。Since the metadata is corrupted for the table Spark can’t drop the table and fails with following exception.

com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: org.apache.spark.sql.AnalysisException: The metadata is corrupted

解决方案Solution

使用 Hive 客户端来删除表,因为 Hive 客户端不会像 Spark 那样检查表是否存在。Use a Hive client to drop the table since the Hive client doesn’t check for the table existence as Spark does. 若要删除表:To drop a table:

  1. 在 Hive 包中创建一个函数。Create a function inside Hive package.
package org.apache.spark.sql.hive {
import org.apache.spark.sql.hive.HiveUtils
import org.apache.spark.SparkContext

object utils {
    def dropTable(sc: SparkContext, dbName: String, tableName: String, ignoreIfNotExists: Boolean, purge: Boolean): Unit = {
      HiveUtils
          .newClientForMetadata(sc.getConf, sc.hadoopConfiguration)
          .dropTable(dbName, tableName, ignoreIfNotExists, false)
    }
  }
}
  1. 删除已损坏的表。Drop corrupted tables.
import org.apache.spark.sql.hive.utils
utils.dropTable(sc, "default", "my_table", true, true)