CACHE(Azure Databricks 上的 Delta Lake)CACHE (Delta Lake on Azure Databricks)

增量缓存中缓存由指定的简单 SELECT 查询访问的数据。Caches the data accessed by the specified simple SELECT query in the Delta cache. 可以通过提供列名称列表来选择要缓存的列的子集,并通过提供谓词来选择行的子集。You can choose a subset of columns to be cached by providing a list of column names and choose a subset of rows by providing a predicate. 这使得后续查询可以尽可能避免扫描原始文件。This enables subsequent queries to avoid scanning the original files as much as possible. 此构造仅适用于 Parquet 表。This construct is applicable only to Parquet tables. 如上所述,还支持视图,但扩展的查询仅限于简单查询。Views are also supported, but the expanded queries are restricted to the simple queries, as described above.


CACHE SELECT column_name[, column_name, ...] FROM table_identifier [ WHERE boolean_expression ]

请参阅增量和 Apache Spark 缓存以了解 RDD 缓存和 Databricks IO 缓存之间的差异。See Delta and Apache Spark caching for the differences between the RDD cache and the Databricks IO cache.

  • table_identifiertable_identifier
    • [database_name.] table_name:表名,可选择使用数据库名称进行限定。[database_name.] table_name: A table name, optionally qualified with a database name.
    • delta.`<path-to-table>`:现有 Delta 表的位置。delta.`<path-to-table>`: The location of an existing Delta table.


CACHE SELECT width, length FROM boxes WHERE height=3