列出表名称Listing table names

本文阐释 spark.catalog.listTables()%sql show tables 为何具有不同的性能特征。This article explains why spark.catalog.listTables() and %sql show tables have different performance characteristics.

问题Problem

若要从元存储中提取所有表名称,可使用 spark.catalog.listTables()%sql show tablesTo fetch all the table names from metastore you can use either spark.catalog.listTables() or %sql show tables. 如果观察提取详细信息所耗时长,会发现 spark.catalog.listTables() 通常需要比 %sql show tables 耗时更长。If you observe the duration to fetch the details you can see spark.catalog.listTables() usually takes longer than %sql show tables.

原因Cause

spark.catalog.listTables() 首先尝试提取每个表的元数据,然后显示请求的表名称。spark.catalog.listTables() tries to fetch every table’s metadata first and then show the requested table names. 处理复杂的架构和更多的表时,此过程很慢。This process is slow when dealing with complex schemas and larger numbers of tables.

解决方案Solution

若要仅获取表名称,请使用 %sql show tables,它在内部调用只提取表名称的 SessionCatalog.listTablesTo get only the table names, use %sql show tables which internally invokes SessionCatalog.listTables which fetches only the table names.