流分析开窗函数简介Introduction to Stream Analytics windowing functions

在实时流方案中,对临时窗口中包含的数据执行操作是一种常见模式。In time-streaming scenarios, performing operations on the data contained in temporal windows is a common pattern. 流分析提供对开窗函数的本机支持,使开发人员能够最小的工作量创建复杂的流进程作业。Stream Analytics has native support for windowing functions, enabling developers to author complex stream processing jobs with minimal effort.

有五种类型的时态窗口可供选择:翻滚跳跃滑动会话窗口。There are five kinds of temporal windows to choose from: Tumbling, Hopping, Sliding, and Session windows. 可在流分析作业中查询语法的 GROUP BY 子句中使用开窗函数 。You use the window functions in the GROUP BY clause of the query syntax in your Stream Analytics jobs. 还可以使用 Windows() 函数聚合在多个窗口中聚合事件。You can also aggregate events over multiple windows using the Windows() function.

所有开窗操作都在窗口“结束”时输出结果 。All the windowing operations output results at the end of the window. 请注意,在启动流分析作业时,可以指定“作业输出开始时间”,系统将自动提取传入流中先前的事件,以在指定时间输出第一个窗口;例如,当你开始使用“立即”选项时,它将立即开始发出数据 。Note that when you start a stream analytics job, you can specify the Job output start time and the system will automatically fetch previous events in the incoming streams to output the first window at the specified time; for example when you start with the Now option, it will start to emit data immediately. 窗口的输出是基于所用聚合函数的单个事件。The output of the window will be single event based on the aggregate function used. 该输出事件包含窗口的结束时间戳,所有开窗函数都以固定的长度定义。The output event will have the time stamp of the end of the window and all window functions are defined with a fixed length.


翻转窗口Tumbling window

翻转开窗函数用于将数据流划分成不同的时间段并对其执行某个函数,如以下示例所示。Tumbling window functions are used to segment a data stream into distinct time segments and perform a function against them, such as the example below. 翻转窗口的主要差异在于它们会重复,不重叠,并且一个事件不能属于多个翻转窗口。The key differentiators of a Tumbling window are that they repeat, do not overlap, and an event cannot belong to more than one tumbling window.


跳跃窗口Hopping window

跳跃开窗函数在一段固定的时间内向前跳跃。Hopping window functions hop forward in time by a fixed period. 可能很容易将它们视为可以重叠且比窗口大小更频繁发出的翻转窗口。It may be easy to think of them as Tumbling windows that can overlap and be emitted more often than the window size. 事件可以属于多个跳跃窗口结果集。Events can belong to more than one Hopping window result set. 要使跳跃窗口与翻转窗口相同,需将跃点大小指定为与窗口大小相同。To make a Hopping window the same as a Tumbling window, specify the hop size to be the same as the window size.


滑动窗口Sliding window

滑动窗口不同于翻转或跳跃窗口,仅在窗口内容实际更改的时间点输出事件。Sliding windows, unlike Tumbling or Hopping windows, output events only for points in time when the content of the window actually changes. 换句话说,事件进入或退出窗口时。In other words, when an event enters or exits the window. 每个窗口至少有一个事件,例如在跳跃窗口中,事件可以属于多个滑动窗口Every window has at least one event, like in the case of Hopping windows, events can belong to more than one sliding window


会话窗口Session window

会话窗口函数对差不多同时到达的事件进行分组,筛选出没有数据的时间段。Session window functions group events that arrive at similar times, filtering out periods of time where there is no data. 它具有三个主要参数:超时、最长持续时间和分区键(可选)。It has three main parameters: timeout, maximum duration, and partitioning key (optional).


第一个事件发生时,会话窗口开始。A session window begins when the first event occurs. 如果在上一个引入事件后的指定超时期间内发生另一事件,那么窗口将扩展到包含该新事件。If another event occurs within the specified timeout from the last ingested event, then the window extends to include the new event. 反之,如果超时期间内没有发生事件,则窗口在超时时关闭。Otherwise if no events occur within the timeout, then the window is closed at the timeout.

如果指定超时期间内持续发生事件,则会话窗口将持续扩展,直到达到最长持续时间。If events keep occurring within the specified timeout, the session window will keep extending until maximum duration is reached. 最长持续时间检间隔设置为与指定最长持续时间相同。The maximum duration checking intervals are set to be the same size as the specified max duration. 例如,如果最长持续时间为 10,则将在 t = 0、10、20、30 等时,检查窗口是否超出最长持续时间。For example, if the max duration is 10, then the checks on if the window exceed maximum duration will happen at t = 0, 10, 20, 30, etc.

如果提供了分区键,则事件按该键组合在一起,会话窗口将分别应用于每个组。When a partition key is provided, the events are grouped together by the key and session window is applied to each group independently. 在需要将不同会话窗口用于不同用户或设备时,此分区十分有帮助。This partitioning is useful for cases where you need different session windows for different users or devices.

快照窗口Snapshot window

快照窗口将具有相同时间戳的事件分组。Snapshot windows groups events that have the same timestamp. 与其他需要特定窗口函数(如 SessionWindow())的窗口类型不同,可以通过将 System.Timestamp() 添加到 GROUP BY 子句来应用快照窗口。Unlike other windowing types, which require a specific window function (such as SessionWindow(), you can apply a snapshot window by adding System.Timestamp() to the GROUP BY clause.


后续步骤Next steps