WATERMARK 子句

适用于:勾选“是” Databricks SQL 勾选“是” Databricks Runtime 12.0 及更高版本

在 select 语句中向关系添加水印。 WATERMARK 子句仅适用于对有状态流数据的查询(包括流-流联接和聚合)。

语法

from_item
{ table_name [ TABLESAMPLE clause ] [ watermark_clause ] [ table_alias ] |
  JOIN clause |
  [ LATERAL ] table_valued_function [ table_alias ] |
  VALUE clause |
  [ LATERAL ] ( query ) [ TABLESAMPLE clause ] [ watermark_clause ] [ table_alias ] }

watermark_clause
  WATERMARK named_expression DELAY OF interval

参数

  • named_expression

    一个表达式,它提供一个 timestamp 类型的值。 该表达式必须是对现有列的引用,或者是基于现有列的确定性转换。 该表达式添加一个时间戳类型的列,用于跟踪水印。 添加的列可用于查询。

  • interval_clause

    间隔文本,定义水印的延迟阈值。 必须是小于一个月的正值。

示例

-- Creating a streaming table performing time window row count, with defining watermark from existing column
> CREATE OR REFRESH STREAMING TABLE window_agg_1
  AS SELECT window(ts, '10 seconds') as w, count(*) as CNT
  FROM
  STREAM stream_source WATERMARK ts DELAY OF INTERVAL 10 SECONDS AS stream
  GROUP BY window(ts, '10 seconds');

-- Creating a streaming table performing time window row count, with deriving a new timestamp column to define watermark
> CREATE OR REFRESH STREAMING TABLE window_agg_2
  AS SELECT window(ts, '10 seconds') as w, count(*) as CNT
  FROM
  STREAM stream_source WATERMARK to_timestamp(ts_str) AS ts DELAY OF INTERVAL 10 SECONDS AS stream
  GROUP BY window(ts, '10 seconds');