如何创建要导入到外部元存储中的表 DDLHow to create table DDLs to import into an external metastore

Azure Databricks 支持使用外部元存储,而不是默认的 Hive 元存储。Azure Databricks supports using external metastores instead of the default Hive metastore.

可以将所有表元数据从 Hive 导出到外部元存储。You can export all table metadata from Hive to the external metastore.

  1. 使用 Apache Spark Catalog API 列出元存储中包含的数据库中的表。Use the Apache Spark Catalog API to list the tables in the databases contained in the metastore.
  2. 使用 SHOW CREATE TABLE 语句生成 DDL 并将其存储到文件。Use the SHOW CREATE TABLE statement to generate the DDLs and store them in a file.
  3. 使用该文件将表 DDL 导入到外部元存储。Use the file to import the table DDLs into the external metastore.

以下代码完成前两个步骤。The following code accomplishes the first two steps.

dbs = spark.catalog.listDatabases()
for db in dbs:
  f = open("your_file_name_{}.ddl".format(db.name), "w")
  tables = spark.catalog.listTables(db.name)
  for t in tables:
    DDL = spark.sql("SHOW CREATE TABLE {}.{}".format(db.name, t.name))
    f.write(DDL.first()[0])
    f.write("\n")
f.close()

可以使用生成的文件将表 DDL 导入到外部元存储。You can use the resulting file to import the table DDLs into the external metastore.