了解 Azure 流分析中的时间处理Understand time handling in Azure Stream Analytics

本文介绍如何做出设计选择,以解决 Azure 流分析作业中的实际时间处理问题。In this article, you learn how to make design choices to solve practical time handling problems in Azure Stream Analytics jobs. 时间处理设计决策与事件排序因素密切相关。Time handling design decisions are closely related to event ordering factors.

后台时间概念Background time concepts

为了在一个更好的框架背景下讨论,现在来定义一些后台概念:To better frame the discussion, let's define some background concepts:

  • 事件时间:原始事件发生的时间。Event time: The time when the original event happened. 例如,当高速公路上行使的汽车接近收费站时。For example, when a moving car on the highway approaches a toll booth.

  • 处理时间:事件到达处理系统时且被观察到的时间。Processing time: The time when the event reaches the processing system and is observed. 例如,收费站传感器看到汽车时,计算机系统需要一些时间来处理数据。For example, when a toll booth sensor sees the car and the computer system takes a few moments to process the data.

  • 水印:一个事件时间标记,指示在某个时点之前,事件已进入流式处理器。Watermark: An event time marker that indicates up to what point events have been ingressed to the streaming processor. 通过水印,系统能够指示事件引入过程的明确进度。Watermarks let the system indicate clear progress on ingesting the events. 流式传输的性质决定了事件数据的传入永远不会停止,因此水印指示的是流式传输中某个时点的进度。By the nature of streams, the incoming event data never stops, so watermarks indicate the progress to a certain point in the stream.

    水印概念很重要。The watermark concept is important. 流分析功能通过水印能够确定系统何时可以生成不需要撤回的完整、正确且可重复的结果。Watermarks allow Stream Analytics to determine when the system can produce complete, correct, and repeatable results that don’t need to be retracted. 该处理过程能够以可预测和可重复的方式完成。The processing can be done in a predictable and repeatable way. 例如,如果需要对某些错误处理条件进行重新计数,则水印是安全的起点和终点。For example, if a recount needs to be done for some error handling condition, watermarks are safe starting and ending points.

有关此主题的更多资源,请参阅 Tyler Akidau 的博客文章流式处理 101流式处理 102For additional resources on this subject, see Tyler Akidau's blog posts Streaming 101 and Streaming 102.

选择最佳开始时间Choose the best starting time

流分析为用户提供两种事件时间选项:到达时间和应用程序时间。Stream Analytics gives users two choices for picking event time: arrival time and application time.

到达时间Arrival time

当事件抵达源时,会在输入源中分配到达时间。Arrival time is assigned at the input source when the event reaches the source. 可以使用事件中心输入的 EventEnqueuedUtcTime 属性、IoT 中心输入的 IoTHub.EnqueuedTime 属性以及 Blob 输入的 BlobProperties.LastModified 属性来访问到达时间。You can access arrival time by using the EventEnqueuedUtcTime property for Event Hubs input, the IoTHub.EnqueuedTime property for IoT Hub input, and the BlobProperties.LastModified property for blob input.

默认选项为使用到达时间,该时间最适用于数据存档方案。这种情况下,不需要时态逻辑。Arrival time is used by default and is best used for data archiving scenarios where temporal logic isn't necessary.

应用程序时间(也称为事件时间)Application time (also named Event Time)

生成事件时,会分配应用程序时间。应用程序时间是事件有效负载的一部分。Application time is assigned when the event is generated, and it's part of the event payload. 若要按应用程序时间处理事件,可在 SELECT 查询中使用“Timestamp by”子句。To process events by application time, use the Timestamp by clause in the SELECT query. 如果缺少“Timestamp by”,则会按到达时间处理事件。If Timestamp by is absent, events are processed by arrival time.

如果需要使用时态逻辑来说明源系统或网络中的延迟,则必须在有效负载中使用时间戳。It's important to use a timestamp in the payload when temporal logic is involved to account for delays in the source system or in the network. 可在 SYSTEM.TIMESTAMP 中查看分配给事件的时间。The time assigned to an event is available in SYSTEM.TIMESTAMP.

Azure 流分析中时间的进展方式How time progresses in Azure Stream Analytics

如果使用应用程序时间,时间进度基于传入的事件。When you use application time, the time progression is based on the incoming events. 流式传输处理系统很难知道是否不存在事件或者事件是否延迟。It's difficult for the stream processing system to know if there are no events, or if events are delayed. 出于此原因,Azure 流分析通过下列方式为每个输入分区生成启发式水印:For this reason, Azure Stream Analytics generates heuristic watermarks in the following ways for each input partition:

  • 每当有事件传入时,水印是到目前为止观察到的最大事件时间流分析与无序容错时段之差。When there's any incoming event, the watermark is the largest event time Stream Analytics has seen so far minus the out-of-order tolerance window size.

  • 没有事件传入时,水印是当前估计到达时间与延迟到达容错时段之差。When there's no incoming event, the watermark is the current estimated arrival time minus the late arrival tolerance window. 估计到达时间是从上次观察到输入事件起经过的时间与该输入事件的到达时间之和。The estimated arrival time is the time that has elapsed from the last time an input event was seen plus that input event's arrival time.

    到达时间只能估计,因为实际到达时间在输入事件中转站(例如,事件中心)上生成,而不是在处理事件的 Azure 流分析 VM 上生成。The arrival time can only be estimated because the real arrival time is generated on the input event broker, such as Event Hubs, nor on the Azure Stream Analytics VM processing the events.

除了生成水印之外,此设计还有其他两个用途:The design serves two additional purposes other than generating watermarks:

  1. 无论是否有传入事件,系统都能及时生成结果。The system generates results in a timely fashion with or without incoming events.

    可以控制输出结果呈现的及时程度。You have control over how timely you want to see the output results. 在 Azure 门户中,在流分析作业的“事件排序”页面上,可以配置“无序事件”设置 。In the Azure portal, on the Event ordering page of your Stream Analytics job, you can configure the Out of order events setting. 在配置该设置时,应权衡事件流中的时效性与无序事件的容错程度。When you configure that setting, consider the trade-off of timeliness with tolerance of out-of-order events in the event stream.

    延迟到达容错时段对于持续生成水印是必需的,即使在没有传入事件的情况下也是如此。The late arrival tolerance window is necessary to keep generating watermarks, even in the absence of incoming events. 有时,可能存在无事件传入的时段,例如,当事件输入流的内容非常稀少时。At times, there may be a period where no incoming events come in, like when an event input stream is sparse. 在输入事件中转站中使用多个分区会导致这一问题变得更加严重。That problem is exacerbated by the use of multiple partitions in the input event broker.

    当输入较少且使用多个分区时,没有延迟到达容错时段的流式数据处理系统可能会受到延迟输出的影响。Streaming data processing systems without a late arrival tolerance window may suffer from delayed outputs when inputs are sparse and multiple partitions are used.

  2. 系统行为必须是可重复的。The system behavior needs to be repeatable. 重复性是流式数据处理系统的重要属性。Repeatability is an important property of a streaming data processing system.

    水印是根据到达时间和应用程序时间生成的。The watermark is derived from the arrival time and application time. 两者都保留在事件中转站中,因此是可重复的。Both are persisted in the event broker, and thus repeatable. 如果在没有事件的情况下估计到达时间,出于故障恢复的目的,Azure 流分析会记录估计的到达时间,以便在重播时实现可重复性。When an arrival time is estimated in the absence of events, Azure Stream Analytics journals the estimated arrival time for repeatability during replay for failure recovery.

选择使用“到达时间”作为事件时间时,无需配置无序容错和延迟到达容错。When you choose to use arrival time as the event time, there you don't need to configure the out-of-order tolerance and late arrival tolerance. 由于要保证“到达时间”在输入事件中转站中递增,因此 Azure 流分析会忽略配置。Since arrival time is guaranteed to be increasing in the input event broker, Azure Stream Analytics simply disregards the configurations.

延迟到达的事件Late arriving events

根据延迟到达容错时段的定义,对于每个传入事件,Azure 流分析会将事件时间与到达时间进行比较。By definition of late arrival tolerance window, for each incoming event, Azure Stream Analytics compares the event time with the arrival time. 如果事件时间在容错时段之外,则可以将系统配置为删除事件或将事件的时间调整为在容错范围内。If the event time is outside of the tolerance window, you can configure the system to drop the event or adjust the event's time to be within the tolerance.

生成水印之后,服务可能会接收事件时间小于水印的事件。Once watermarks are generated, the service can potentially receive events with an event time lower than the watermark. 可以将服务配置为“删除”这些事件,或将事件的时间“调整”为水印值。You can configure the service to either drop those events, or adjust the event's time to the watermark value.

调整时,将事件的“System.Timestamp”设置为新值,但“事件时间”字段本身不会更改 。As a part of the adjustment, the event's System.Timestamp is set to the new value, but the event time field itself is not changed. 此调整是唯一存在后列现象的情况:事件的“System.Timestamp”与事件时间字段中的值不同,且可能导致产生意外的结果。This adjustment is the only situation where an event's System.Timestamp can be different from the value in the event time field and may cause unexpected results to be generated.

使用子流处理时间变化Handle time variation with substreams

所述的启发式水印生成机制在大多数后述情况下都能正常发挥效用:时间在各种事件发送程序之间基本同步。The heuristic watermark generation mechanism described works well in most of cases where time is mostly synchronized between the various event senders. 但在现实生活中,尤其是在许多 IoT 方案中,系统对事件发送方的时钟几乎无法控制。However, in real life, especially in many IoT scenarios, the system has little control over the clock on the event senders. 事件发送方可以是领域中的各种设备,也可以是不同版本的硬件和软件。The event senders could be all sorts of devices in the field, perhaps on different versions of hardware and software.

流分析没有对输入分区中的所有事件使用全局水印,而是使用另一种称为子流的机制。Instead of using a watermark that is global to all events in an input partition, Stream Analytics has another mechanism called substreams. 可以通过编写使用“TIMESTAMP BY”子句和“OVER”关键字的作业查询,在作业中利用子流 。You can utilize substreams in your job by writing a job query that uses the TIMESTAMP BY clause and the keyword OVER. 要指定子流,请在“OVER”关键字后面提供关键列名称(例如,deviceid),以便系统按该列应用时间策略。To designate the substream, provide a key column name after the OVER keyword, such as a deviceid, so that system applies time policies by that column. 每个子流都有自己独立的水印。Each substream gets its own independent watermark. 在处理事件发送方之间的较大时钟偏差或网络延迟时,此机制有助于及时生成输出。This mechanism is useful to allow timely output generation, when dealing with large clock skews or network delays among event senders.

子流是 Azure 流分析提供的独家解决方案,其他流式数据处理系统不提供子流。Substreams are a unique solution provided by Azure Stream Analytics, and are not offered by other streaming data processing systems.

在使用子流时,流分析将延迟到达容错时段应用于传入事件。When you use substreams, Stream Analytics applies the late arrival tolerance window to incoming events. 延迟到达容错决定了不同子流可以彼此间隔的最大数。The late arrival tolerance decides the maximum amount by which different substreams can be apart from each other. 例如,如果设备 1 位于时间戳 1,而设备 2 位于时间戳 2,则延迟到达容错的最大值为时间戳 2 与时间戳 1 之差。For example, if Device 1 is at Timestamp 1, and Device 2 is at Timestamp 2, the at most late arrival tolerance is Timestamp 2 minus Timestamp 1. 对于具有不同时间戳的设备来说,默认设置为 5 秒可能太小。The default setting is 5 seconds and is likely too small for devices with divergent timestamps. 建议开始时设为 5 分钟,然后根据设备时钟偏差模式进行调整。We recommend that you start with 5 minutes and make adjustments according to their device clock skew pattern.

提前到达的事件Early arriving events

你可能已注意到另一个称为“提前到达时段”的概念,它看起来与延迟到达容错时段相反。You may have noticed another concept called early arrival window that looks like the opposite of late arrival tolerance window. 这个时段固定为 5 分钟,其用途与延迟到达容错时段不同。This window is fixed at 5 minutes and serves a different purpose from the late arrival tolerance window.

由于 Azure 流分析可保证完整的结果,因此只能将“作业开始时间”指定为作业的第一个输出时间,而不能指定为输入时间。Because Azure Stream Analytics guarantees complete results, you can only specify job start time as the first output time of the job, not the input time. 作业开始时间是必需的,以便处理完整时段,而不仅仅是从时段中间进行处理。The job start time is required so that the complete window is processed, not just from the middle of the window.

流分析通过查询规范得到开始时间。Stream Analytics derives the start time from the query specification. 但是,由于输入事件中转站仅按到达时间编制索引,因此系统必须将开始事件时间转换为到达时间。However, because the input event broker is only indexed by arrival time, the system has to translate the starting event time to arrival time. 系统可以从输入事件中转站中的这一时点开始处理事件。The system can start processing events from that point in the input event broker. 由于提前到达时段的限制,这一转换很简单:开始事件时间减去 5 分钟的提前到达时段。With the early arriving window limit, the translation is straightforward: starting event time minus the 5-minute early arriving window. 此计算还意味着系统会删除事件时间比到达时间早 5 分钟的所有事件。This calculation also means that the system drops all events that are seen as having an event time 5 minutes earlier than the arrival time. 删除事件时,提前输入事件指标是递增的。The early input events metric is incremented when the events are dropped.

此概念用于确保无论从何处开始输出,处理过程都可重复。This concept is used to ensure the processing is repeatable no matter where you start to output from. 如果不借助这种机制,就不可能像许多其他流式处理系统声称的那样保证可重复性。Without such a mechanism, it would not be possible to guarantee repeatability, as many other streaming systems claim they do.

事件排序时间容错的副作用Side effects of event ordering time tolerances

流分析作业有多个“事件排序”选项。Stream Analytics jobs have several Event ordering options. 可以在 Azure 门户中配置两个选项:“无序事件”设置(无序容错)和“延迟到达的事件”设置(延迟到达容错) 。Two can be configured in the Azure portal: the Out of order events setting (out-of-order tolerance), and the Events that arrive late setting (late arrival tolerance). “提前到达”容错是固定的,无法调整。The early arrival tolerance is fixed and cannot be adjusted. 流分析使用这些时间策略来提供强有力的保障。These time policies are used by Stream Analytics to provide strong guarantees. 但是,这些设置确实会产生一些意想不到的影响:However, these settings do have some sometimes unexpected implications:

  1. 意外过早发送事件。Accidentally sending events that are too early.

    不应正常输出提前事件。Early events should not be outputted normally. 如果发送方的时钟运行太快,可能会输出提前事件。It's possible that early events are sent to the output if sender’s clock is running too fast though. 将删除所有提前到达的事件,以防输出中显示这些事件。All early arriving events are dropped, so you will not see any of them from the output.

  2. 将旧事件发送到事件中心供 Azure 流分析处理。Sending old events to Event Hubs to be processed by Azure Stream Analytics.

    旧的事件最初可能看起来无害,但由于应用了延迟到达容错,可能会删除旧的事件。While old events may seem harmless at first, because of the application of the late arrival tolerance, the old events may be dropped. 如果事件太旧,System.Timestamp 值将在事件引入期间发生更改。If the events are too old, the System.Timestamp value is altered during event ingestion. 由于此行为,当前 Azure 流分析更适合接近实时的事件处理方案,而不适合历史事件处理方案。Due to this behavior, currently Azure Stream Analytics is more suited for near-real-time event processing scenarios, instead of historical event processing scenarios. 在某些情况下,可以将“延迟到达的事件”时间设置为最大可能值(20 天)以解决此问题。You can set the Events that arrive late time to the largest possible value (20 days) to work around this behavior in some cases.

  3. 输出似乎有所延迟。Outputs seem to be delayed.

    在计算的时间生成第一个水印:系统迄今观察到的“最大事件时间”减去无序容错时段。The first watermark is generated at the calculated time: the maximum event time the system has observed so far, minus the out-of-order tolerance window size. 默认情况下,无序容错配置为零(00 分 00 秒)。By default, the out-of-order tolerance is configured to zero (00 minutes and 00 seconds). 将无序容错设置为更高的非零时间值时,因计算的第一个水印时间的缘故,会延迟流式作业的第一个输出,延迟时间即为该时间值(或更高值)。When you set it to a higher, non-zero time value, the streaming job's first output is delayed by that value of time (or greater) due to the first watermark time that is calculated.

  4. 输入非常少。Inputs are sparse.

    如果给定分区中没有任何输入,水印时间计算方式为“到达时间”减去延迟到达容错时段。When there is no input in a given partition, the watermark time is calculated as the arrival time minus the late arrival tolerance window. 因此,如果事件输入不频繁且很少,则可能会延迟输出,延迟时间为此时间量。As a result, if input events are infrequent and sparse, the output can be delayed by that amount of time. “延迟到达的事件”的默认值为 5 秒。The default Events that arrive late value is 5 seconds. 例如,一次发送一个输入事件时,应会产生一定的延迟。You should expect to see some delay when sending input events one at a time, for example. 将“延迟到达的事件”时段设置为较大值时,延迟会更长。The delays can get worse, when you set Events that arrive late window to a large value.

  5. “System.Timestamp”值与“事件时间”字段中的时间不同 。System.Timestamp value is different from the time in the event time field.

    如前所述,系统通过无序容错或延迟到达容错时段调整事件时间。As described previously, the system adjusts event time by the out-of-order tolerance or late arrival tolerance windows. 事件的“System.Timestamp”值进行了调整,但不会调整“事件时间”字段 。The System.Timestamp value of the event is adjusted, but not the event time field. 这可用于确定时间戳调整了哪些事件。This can be used to identify for which events the timestamps adjusted. 如果系统由于其中一个容错而更改了时间戳,则通常它们是相同的。If the system changed the timestamp due to one of the tolerances, normally they are the same.

要观察的指标Metrics to observe

可以通过流分析作业指标观察一系列的事件排序时间容错效应。You can observe a number of the Event ordering time tolerance effects through Stream Analytics job metrics. 以下指标是相关的:The following metrics are relevant:

指标Metric 说明Description
无序事件数Out-of-Order Events 表示收到的无序事件的数目,这些事件或已删除,或已为其提供调整后的时间戳。Indicates the number of events received out of order, that were either dropped or given an adjusted timestamp. 此指标直接受 Azure 门户中作业的“事件排序”页面上“无序事件”设置的配置影响 。This metric is directly impacted by the configuration of the Out of order events setting on the Event ordering page on the job in the Azure portal.
延迟输入事件数Late Input Events 表示从源延迟到达的事件数目。Indicates the number of events arriving late from the source. 此指标包括已删除或者已调整其时间戳的事件。This metric includes events that have been dropped or have had their timestamp was adjusted. 此指标直接受 Azure 门户中作业的“事件排序”页面上的“延迟到达的事件”设置的配置影响 。This metric is directly impacted by the configuration of the Events that arrive late setting in the Event ordering page on the job in the Azure portal.
提前输入事件数Early Input Events 表示从源处提前到达的事件的数目,这些事件或已删除,或提前超过 5 分钟已调整了时间戳。Indicates the number of events arriving early from the source that have either been dropped, or their timestamp has been adjusted if they are beyond 5 minutes early.
水印延迟Watermark Delay 表示流式数据处理作业延迟。Indicates the delay of the streaming data processing job. 有关详细信息,请参阅以下部分。See more information in the following section.

水印延迟详细信息Watermark delay details

“水印延迟”指标的计算方式为处理节点的时钟时间减去其迄今观察到的最大水印时间。The Watermark delay metric is computed as the wall clock time of the processing node minus the largest watermark it has seen so far. 有关详细信息,请参阅水印延迟博客文章For more information, see the watermark delay blog post.

在正常操作下,此指标值大于 0 可能有多种原因:There can be several reasons this metric value is larger than 0 under normal operation:

  1. 流式管道的内在处理延迟。Inherent processing delay of the streaming pipeline. 通常这种延迟是常规性的。Normally this delay is nominal.

  2. 无序容错时段带来延迟,因为水印扣减了容错时段的量。The out-of-order tolerance window introduced delay, because watermark is reduced by the size of the tolerance window.

  3. 延迟到达时段带来延迟,因为水印扣减了容错时段的量。The late arrival window introduced delay, because watermark is reduced by the size the tolerance window.

  4. 生成指标的处理节点存在时钟偏差。Clock skew of the processing node generating the metric.

还存在许多其他可能导致流式管道速度变慢的资源限制。There are a number of other resource constraints that can cause the streaming pipeline to slow down. 由于以下原因,水印延迟指标可能会上升:The watermark delay metric can rise due to:

  1. 流分析中没有足够的处理资源用于处理输入事件。Not enough processing resources in Stream Analytics to handle the volume of input events. 有关详细信息,请参阅了解和调整流单元To scale up resources, see Understand and adjust Streaming Units.

  2. 输入事件中转站中的吞吐量不足,因此会受到限制。Not enough throughput within the input event brokers, so they are throttled. 有关可能的解决方案,请参阅自动增加 Azure 事件中心吞吐量单位For possible solutions, see Automatically scale up Azure Event Hubs throughput units.

  3. 输出接收器没有预配足够的容量,因此会受到限制。Output sinks are not provisioned with enough capacity, so they are throttled. 可能的解决方案因所使用的输出服务的风格而有很大差异。The possible solutions vary widely based on the flavor of output service being used.

输出事件频率Output event frequency

Azure 流分析使用水印进度作为生成输出事件的唯一触发器。Azure Stream Analytics uses watermark progress as the only trigger to produce output events. 水印源自输入数据,因此在故障恢复期间以及用户发起的重新处理过程中都是可重复的。Because the watermark is derived from input data, it is repeatable during failure recovery and also in user initiated reprocessing. 使用时段化聚合时,该服务仅在时段结束时生成输出。When using windowed aggregates, the service only produces outputs at the end of the windows. 在某些情况下,用户可能希望查看从时段生成的部分聚合。In some cases, users may want to see partial aggregates generated from the windows. Azure 流分析目前不支持某些聚合。Partial aggregates are not supported currently in Azure Stream Analytics.

在其他流式处理解决方案中,输出事件可以在各种触发点处具体化,具体取决于外部环境。In other streaming solutions, output events could be materialized at various trigger points, depending on external circumstances. 在某些解决方案中,可能会为给定时段组多次生成输出事件。It's possible in some solutions that the output events for a given time window could be generated multiple times. 随着输入值的细化,聚合结果会更加准确。As the input values are refined, the aggregate results become more accurate. 可以先推测事件,然后随着时间的推移进行修正。Events could be speculated at first, and revised over time. 例如,当某个设备从网络离线时,系统可以使用估计值。For example, when a certain device is offline from the network, an estimated value could be used by a system. 稍后,同一设备联机到网络。Later on, the same device comes online to the network. 然后可以将实际事件数据包含在输入流中。Then the actual event data could be included in the input stream. 处理该时段所得到的输出结果会包含更准确的输出内容。The output results from processing that time window produces more accurate output.

水印示例图Illustrated example of watermarks

以下图像显示了水印在不同情况下的进展方式。The following images illustrate how watermarks progress in different circumstances.

此表显示了下面制成图表的示例数据。This table shows the example data that is charted below. 请注意,事件时间与到达时间会有变化,有时匹配,而有时不匹配。Notice that the event time and the arrival time vary, sometimes matching and sometimes not.

事件时间Event time 到达时间Arrival time DeviceIdDeviceId
12:0712:07 12:0712:07 device1device1
12:0812:08 12:0812:08 device2device2
12:1712:17 12:1112:11 device1device1
12:0812:08 12:1312:13 设备3device3
12:1912:19 12:1612:16 device1device1
12:1212:12 12:1712:17 设备3device3
12:1712:17 12:1812:18 device2device2
12:2012:20 12:1912:19 device2device2
12:1612:16 12:2112:21 设备3device3
12:2312:23 12:2212:22 device2device2
12:2212:22 12:2412:24 device2device2
12:2112:21 12:2712:27 设备3device3

此图中使用了以下容错:In this illustration, the following tolerances are used:

  • 提前到达时段为 5 分钟Early arrival windows is 5 minutes
  • 延迟到达时段为 5 分钟Late arriving window is 5 minutes
  • 重新排序时段为 2 分钟Reorder window is 2 minutes
  1. 这些事件进展的水印图示:Illustration of watermark progressing through these events:

    Azure 流分析水印图示

    上图中的显著进程:Notable processes illustrated in the preceding graphic:

    1. 第一个事件 (device1) 和第二个事件 (device2) 具有一致的时间,未经过调整即被处理。The first event (device1), and second event (device2) have aligned times and are processed without adjustments. 水印时间在每个事件上持续计时。The watermark progresses on each event.

    2. 处理第三个事件 (device1) 时,到达时间 (12:11) 早于事件时间 (12:17)。When the third event (device1) is processed, the arrival time (12:11) precedes the event time (12:17). 事件提前 6 分钟到达,因 5 分钟的提前到达容错而删除了该事件。The event arrived 6 minutes early, so the event is dropped due to the 5-minute early arrival tolerance.

      在发生这种事件提前到达的情况下,水印时间不会继续计时。The watermark doesn't progress in this case of an early event.

    3. 第四个事件 (device3) 和第五个事件 (device1) 具有一致的时间,未经过调整即被处理。The fourth event (device3), and fifth event (device1) have aligned times and are processed without adjustment. 水印时间在每个事件上持续计时。The watermark progresses on each event.

    4. 处理第六个事件 (device3) 时,到达时间 (12:17) 和事件时间 (12:12) 均低于水印时间。When the sixth event (device3) is processed, the arrival time (12:17) and the event time (12:12) is below the watermark level. 事件时间调整为水印时间 (12:17)。The event time is adjusted to the water mark level (12:17).

    5. 处理第十二个事件 (device3) 时,到达时间 (12:27) 比事件时间 (12:21) 提前 6 分钟。When the twelfth event (device3) is processed, the arrival time (12:27) is 6 minutes ahead of the event time (12:21). 适用延迟到达策略。The late arrival policy is applied. 对事件时间进行了调整 (12:22),该时间高于水印时间 (12:21),因此不再进行进一步调整。The event time is adjusted (12:22), which is above the watermark (12:21) so no further adjustment is applied.

  2. 未应用提前到达策略的水印计时第二图示:Second illustration of watermark progressing without an early arrival policy:

    未应用提前策略的 Azure 流分析水印图示

    在此示例中,未应用提前到达策略。In this example, no early arrival policy is applied. 较早到达的异常事件会显着提高水印值。Outlier events that arrive early raise the watermark significantly. 请注意,在这种情况下不删除第三个事件(deviceId1 时间 12:11),且水印时间提升至 12:15。Notice the third event (deviceId1 at time 12:11) is not dropped in this scenario, and the watermark is raised to 12:15. 因此,第四个事件时间向后调整了 7 分钟(从 12:08 到 12:15)。The fourth event time is adjusted forward 7 minutes (12:08 to 12:15) as a result.

  3. 在最后一图中,使用了子流(通过 DeviceId)。In the final illustration, substreams are used (OVER the DeviceId). 跟踪了多个水印,每个流一个水印。Multiple watermarks are tracked, one per stream. 因此,调整时间的事件减少了。There are fewer events with their times adjusted as a result.

    Azure 流分析子流水印图示

后续步骤Next steps