列出表名称Listing table names
本文阐释 spark.catalog.listTables()
和 %sql show tables
为何具有不同的性能特征。This article explains why spark.catalog.listTables()
and %sql show tables
have different performance characteristics.
问题Problem
若要从元存储中提取所有表名称,可使用 spark.catalog.listTables()
或 %sql show tables
。To fetch all the table names from metastore you can use either spark.catalog.listTables()
or %sql show tables
.
如果观察提取详细信息所耗时长,会发现 spark.catalog.listTables()
通常需要比 %sql show tables
耗时更长。If you observe the duration to fetch the details you can see spark.catalog.listTables()
usually takes longer than %sql show tables
.
原因Cause
spark.catalog.listTables()
首先尝试提取每个表的元数据,然后显示请求的表名称。spark.catalog.listTables()
tries to fetch every table’s metadata first and then show the requested table names. 处理复杂的架构和更多的表时,此过程很慢。This process is slow when dealing with complex schemas and larger numbers of tables.
解决方案Solution
若要仅获取表名称,请使用 %sql show tables
,它在内部调用只提取表名称的 SessionCatalog.listTables
。To get only the table names, use %sql show tables
which internally invokes SessionCatalog.listTables
which fetches only the table names.