Azure 流分析解决方案模式Azure Stream Analytics solution patterns

与 Azure 中的其他许多服务一样,最好是将流分析与其他服务搭配使用,以创建更大的端到端解决方案。Like many other services in Azure, Stream Analytics is best used with other services to create a larger end-to-end solution. 本文介绍简单的 Azure 流分析解决方案和各种体系结构模式。This article discusses simple Azure Stream Analytics solutions and various architectural patterns. 可以基于这些模式构建项目以开发更复杂的解决方案。You can build on these patterns to develop more complex solutions. 本文所述的模式可用于各种方案。The patterns described in this article can be used in a wide variety of scenarios. Azure 解决方案体系结构中提供了特定于方案的模式示例。Examples of scenario-specific patterns are available on Azure solution architectures.

对仪表板使用 SQLUse SQL for dashboard

Power BI 仪表板提供较低的延迟,但无法用于生成完备的 Power BI 报表。The Power BI dashboard offers low latency, but it cannot be used to produce full fledged Power BI reports. 常见报告模式是先将数据输出到 SQL 数据库。A common reporting pattern is to output your data to SQL Database first. 然后使用 Power BI 的 SQL 连接器在 SQL 中查询最新数据。Then use Power BI's SQL connector to query SQL for the latest data.

ASA SQL 仪表板

使用 SQL 数据库可以获得更大的灵活性,低价是延迟略有提高。Using SQL Database gives you more flexibility but at the expense of a slightly higher latency. 此解决方案最适合延迟要求超过一秒的作业。This solution is optimal for jobs with latency requirements greater than one second. 此方法可以最大程度地发挥 Power BI 的功能,以进一步细分报表数据,并提供多得多的可视化选项。With this method, you can maximize Power BI capabilities to further slice and dice the data for reports, and much more visualization options. 此外,你可以灵活地使用其他仪表板解决方案,例如 Tableau。You also gain the flexibility of using other dashboard solutions, such as Tableau.

SQL 不是一种高吞吐量的数据存储。SQL is not a high throughput data store. 从 Azure 流分析到 SQL 数据库的最大吞吐量目前大约为 24 MB/秒。The maximum throughput to SQL Database from Azure Stream Analytics is currently around 24 MB/s. 如果解决方案中的事件源以更高的速率生成数据,则你需要使用流分析中的处理逻辑来降低到 SQL 的输出速率。If the event sources in your solution produce data at a higher rate, you need to use processing logic in Stream Analytics to reduce the output rate to SQL. 可以结合时态联接和分析函数使用筛选、开窗聚合、模式匹配等技术。Techniques such as filtering, windowed aggregates, pattern matching with temporal joins, and analytic functions can be used. 可以使用到 Azure SQL 数据库的 Azure 流分析输出中所述的技术进一步优化到 SQL 的输出速率。The output rate to SQL can be further optimized using techniques described in Azure Stream Analytics output to Azure SQL Database.

使用事件消息传送将实时见解整合到应用程序中Incorporate real-time insights into your application with event messaging

流分析的第二种最常见用途是生成实时警报。The second most popular use of Stream Analytics is to generate real-time alerts. 在此解决方案模式中,可以使用流分析中的业务逻辑来检测时态和空间模式异常,然后生成警报信号。In this solution pattern, business logic in Stream Analytics can be used to detect temporal and spatial patterns or anomalies, then produce alerting signals. 但是,与其中的流分析使用 Power BI 作为首选终结点的仪表板解决方案不同,可以使用许多的中间数据接收器。However, unlike the dashboard solution where Stream Analytics uses Power BI as a preferred endpoint, a number of intermediate data sinks can be used. 这些接收器包括事件中心、服务总线和 Azure Functions。These sinks include Event Hubs, Service Bus, and Azure Functions. 应用程序构建者需要确定哪个数据接收器最适合自己的方案。You, as the application builder, need to decide which data sink works best for your scenario.

必须实现下游事件使用者逻辑才能在现有的业务工作流中生成警报。Downstream event consumer logic must be implemented to generate alerts in your existing business workflow. 由于可以在 Azure Functions 中实现自定义逻辑,Azure Functions 是执行这种集成的最快方式。Because you can implement custom logic in Azure Functions, Azure Functions is the fastest way you can perform this integration. 从 Azure 流分析作业运行 Azure Functions 中可以找到有关使用 Azure 函数作为流分析作业输出的教程。A tutorial for using Azure Function as the output for a Stream Analytics job can be found in Run Azure Functions from Azure Stream Analytics jobs. Azure Functions 还支持各种类型的通知,包括短信和电子邮件。Azure Functions also supports various types of notifications including text and email. 逻辑应用也可用于此类集成,它在流分析与逻辑应用之间使用事件中心。Logic App may also be used for such integration, with Event Hubs between Stream Analytics and Logic App.

ASA 事件消息传送应用

另一方面,事件中心提供最灵活的集成点。Event Hubs, on the other hand, offers the most flexible integration point. 其他许多服务(例如 Azure 数据资源管理器和时序见解)可以使用来自事件中心的事件。Many other services, like Azure Data Explorer and Time Series Insights can consume events from Event Hubs. 服务可以直接从 Azure 流分析连接到事件中心接收器,以完成解决方案。Services can be connected directly to the Event Hubs sink from Azure Stream Analytics to complete the solution. 对于此类集成方案,事件中心也是 Azure 上提供的最高吞吐量消息传送代理。Event Hubs is also the highest throughput messaging broker available on Azure for such integration scenarios.

动态应用程序和网站Dynamic applications and websites

可以使用 Azure 流分析和 Azure SignalR 服务创建自定义的实时可视化效果,例如仪表板或地图可视化效果。You can create custom real-time visualizations, such as dashboard or map visualization, using Azure Stream Analytics and Azure SignalR Service. 使用 SignalR 可以更新 Web 客户端和实时显示动态内容。Using SignalR, web clients can be updated and show dynamic content in real-time.

ASA 动态应用

通过数据存储将实时见解整合到应用程序中Incorporate real-time insights into your application through data stores

当今的大多数 Web 服务和 Web 应用程序都使用请求/响应模式来为呈现层提供服务。Most web services and web applications today use a request/response pattern to serve the presentation layer. 请求/响应模式很容易构建,可以使用无状态前端和可缩放存储(例如 Cosmos DB)以较短的响应时间对其进行缩放。The request/response pattern is simple to build and can be easily scaled with low response time using a stateless frontend and scalable stores, like Cosmos DB.

高数据量通常会在基于 CRUD 的系统中产生性能瓶颈。High data volume often creates performance bottlenecks in a CRUD-based system. 事件寻源解决方案模式可用于解决性能瓶颈。The event sourcing solution pattern is used to address the performance bottlenecks. 此外,从传统的数据存储中提取时态模式和见解也比较困难,且效率低下。Temporal patterns and insights are also difficult and inefficient to extract from a traditional data store. 大量数据驱动的新式应用程序通常采用基于数据流的体系结构。Modern high-volume data driven applications often adopt a dataflow-based architecture. 作为动态数据的计算引擎,Azure 流分析是该体系结构中的关键所在。Azure Stream Analytics as the compute engine for data in motion is a linchpin in that architecture.

ASA 事件寻源应用

在此解决方案模式中,事件由 Azure 流分析处理并聚合到数据存储中。In this solution pattern, events are processed and aggregated into data stores by Azure Stream Analytics. 应用层使用传统的请求/响应模式来与数据存储交互。The application layer interacts with data stores using the traditional request/response pattern. 由于流分析能够实时处理大量事件,因此应用程序具有高度可伸缩性,无需大量占用数据存储层。Because of Stream Analytics' ability to process a large number of events in real-time, the application is highly scalable without the need to bulk up the data store layer. 数据存储层在本质上是系统中的具体化视图。The data store layer is essentially a materialized view in the system. 到 Azure Cosmos DB 的 Azure 流分析输出介绍了如何将 Cosmos DB 用作流分析输出。Azure Stream Analytics output to Azure Cosmos DB describes how Cosmos DB is used as a Stream Analytics output.

在处理逻辑比较复杂,且需要单独升级逻辑的特定部分的实际应用程序中,可将多个流分析作业作为中间事件代理与事件中心组合到一起。In real applications where processing logic is complex and there is the need to upgrade certain parts of the logic independently, multiple Stream Analytics jobs can be composed together with Event Hubs as the intermediary event broker.

ASA 复杂事件寻源应用

此模式改善了系统的复原能力和可管理性。This pattern improves the resiliency and manageability of the system. 但是,尽管流分析保证只处理一次,但重复事件仍可能会进入中间事件中心(这种情况很少见)。However, even though Stream Analytics guarantees exactly once processing, there is a small chance that duplicate events may land in the intermediary Event Hubs. 下游流分析作业非常在回查窗口中使用逻辑键删除重复事件。It's important for the downstream Stream Analytics job to dedupe events using logic keys in a lookback window. 有关事件传递的详细信息,请参阅事件传递保证参考文章。For more information on event delivery, see Event Delivery Guarantees reference.

使用引用数据进行应用程序自定义Use reference data for application customization

Azure 流分析引用数据功能是专门为警报阈值、处理规则和地理围栏等最终用户自定义操作设计的。The Azure Stream Analytics reference data feature is designed specifically for end-user customization like alerting threshold, processing rules, and geofences. 应用层可以接受参数更改,并将其存储在 SQL 数据库中。The application layer can accept parameter changes and store them in SQL Database. 流分析作业定期查询数据库中的更改,并使得自定义参数可通过引用数据联接操作进行访问。The Stream Analytics job periodically queries for changes from the database and makes the customization parameters accessible through a reference data join. 有关如何使用参考数据进行应用程序自定义的详细信息,请参阅参考数据联接For more information on how to use reference data for application customization, see reference data join.

此模式还可用于实现从引用数据定义规则阈值的规则引擎。This pattern can also be used to implement a rules engine where the thresholds of the rules are defined from reference data. 有关规则的详细信息,请参阅在 Azure 流分析中处理基于阈值的可配置规则For more information on rules, see Process configurable threshold-based rules in Azure Stream Analytics.

ASA 引用数据应用

将机器学习添加到实时见解Add Machine Learning to your real-time insights

Azure 流分析的内置异常情况检测模型是将机器学习引入实时应用程序的简便方法。Azure Stream Analytics' built-in Anomaly Detection model is a convenient way to introduce Machine Learning to your real-time application.

对于想要将联机训练和评分整合到同一流分析管道的高级用户,请参阅有关如何使用线性回归实现此目的的示例。For advanced users who want to incorporate online training and scoring into the same Stream Analytics pipeline, see this example of how do that with linear regression.

ASA 机器学习应用

实时数据仓库Real-time data warehousing

另一种常见模式是实时数据仓库,也称为流数据仓库。Another common pattern is real-time data warehousing, also called streaming data warehouse. 除了从应用程序传入事件中心和 IoT 中心的事件以外,还可使用 IoT Edge 上运行的 Azure 流分析来满足数据清理、数据化减、数据存储和转发需求。In addition to events arriving at Event Hubs and IoT Hub from your application, Azure Stream Analytics running on IoT Edge can be used to fulfill data cleansing, data reduction, and data store and forward needs. 在 IoT Edge 上运行的流分析可以正常处理系统中的带宽限制和连接问题。Stream Analytics running on IoT Edge can gracefully handle bandwidth limitation and connectivity issues in the system. 写入 Azure Synapse Analytics 时,流分析可支持最大 200MB/秒的吞吐率。Stream Analytics can support throughput rates of upto 200MB/sec while writing to Azure Synapse Analytics.

ASA 数据仓库

存档实时数据用于分析Archiving real-time data for analytics

大部分数据科学和分析活动仍脱机进行。Most data science and analytics activities still happen offline. Azure 流分析可以通过 Azure Data Lake Store Gen2 输出和 Parquet 输出格式存档数据。Data can be archived by Azure Stream Analytics through Azure Data Lake Store Gen2 output and Parquet output formats. 此功能消除了将数据直接馈送到 Azure Data Lake Analytics、Azure Databricks 和 Azure HDInsight 所造成的冲突。This capability removes the friction to feed data directly into Azure Data Lake Analytics, Azure Databricks, and Azure HDInsight. 在此解决方案中,Azure 流分析用作近实时 ETL 引擎。Azure Stream Analytics is used as a near real-time ETL engine in this solution. 可以使用各种计算引擎来探索 Data Lake 中存档的数据。You can explore archived data in Data Lake using various compute engines.

ASA 脱机分析ASA offline analytics

使用引用数据进行扩充Use reference data for enrichment

ETL 引擎通常需要数据扩充。Data enrichment is often a requirement for ETL engines. Azure 流分析支持使用 SQL 数据库和 Azure Blob 存储中的参考数据扩充数据。Azure Stream Analytics supports data enrichment with reference data from both SQL Database and Azure Blob storage. 可以扩充进入 Azure Data Lake 和 SQL 数据仓库的数据。Data enrichment can be done for data landing in both Azure Data Lake and SQL Data Warehouse.

使用数据扩充进行 ASA 脱机分析

从存档的数据操作化见解Operationalize insights from archived data

如果将脱机分析模式与近实时应用程序模式相结合,则可以创建一个反馈循环。If you combine the offline analytics pattern with the near real-time application pattern, you can create a feedback loop. 该反馈循环可让应用程序自动调整数据的更改模式。The feedback loop lets the application automatically adjust for changing patterns in the data. 此反馈循环可以十分简单,只需更改警报的阈值;也可以十分复杂,需要重新训练机器学习模型。This feedback loop can be as simple as changing the threshold value for alerting, or as complex as retraining Machine Learning models. 可将同一个解决方案体系结构应用到在云中和 IoT Edge 中运行的 ASA 作业。The same solution architecture can be applied to both ASA jobs running in the cloud and on IoT Edge.

ASA 见解操作化

如何监视 ASA 作业How to monitor ASA jobs

可以不间断地运行 Azure 流分析作业,以实时连续处理传入的事件。An Azure Stream Analytics job can be run 24/7 to process incoming events continuously in real time. 流分析的运行时间保证对于整个应用程序的运行状况而言至关重要。Its uptime guarantee is crucial to the health of the overall application. 尽管流分析是行业中唯一提供 99.9% 可用性保证的流分析服务,但你仍可能会遇到一定程度的停机时间。While Stream Analytics is the only streaming analytics service in the industry that offers a 99.9% availability guarantee, you may still incur some level of down time. 多年来,流分析陆续引进了指标、日志和作业状态来反映作业的运行状况。Over the years, Stream Analytics has introduced metrics, logs, and job states to reflect the health of the jobs. 所有这些信息都可以在 Azure Monitor 服务中查看,并可以进一步导出到 OMS。All of them are surfaced through Azure Monitor service and can be further exported to OMS. 有关详细信息,请参阅了解流分析作业监视以及如何监视查询For more information, see Understand Stream Analytics job monitoring and how to monitor queries.

ASA 监视

需要监视两项关键信息:There are two key things to monitor:

  • 作业失败状态Job failed state

    首先且最重要的是,需要确保该作业正在运行。First and foremost, you need to make sure the job is running. 如果作业不是处于运行状态,则不会生成新的指标或日志。Without the job in the running state, no new metrics or logs are generated. 作业可能因各种原因而更改为失败状态,包括 SU 利用率较高(即资源不足)。Jobs can change to a failed state for various reasons, including having a high SU utilization level (i.e., running out of resources).

  • 水印延迟指标Watermark delay metrics

    此指标反映处理管道滞后于挂钟时间的程度(秒)。This metric reflects how far behind your processing pipeline is in wall clock time (seconds). 某些延迟归因于固有的处理逻辑。Some of the delay is attributed to the inherent processing logic. 因此,监视趋势的增长比监视绝对值重要得多。As a result, monitoring the increasing trend is much more important than monitoring the absolute value. 应该通过应用程序设计而不是监视或警报来解决稳定态延迟。The steady state delay should be addressed by your application design, not by monitoring or alerts.

失败时,活动日志和诊断日志是查找错误的最佳起始位置。Upon failure, activity logs and diagnostics logs are the best places to begin looking for errors.

构建可复原的任务关键型应用程序Build resilient and mission critical applications

尽管 Azure 流分析提供 SLA 保证,也不管你是如何谨慎地运行端到端应用程序,中断仍会不时地发生。Regardless of Azure Stream Analytics' SLA guarantee and how careful you run your end-to-end application, outages happen. 对于任务关键型应用程序,需要为中断做好准备,以便到时能够正常恢复。If your application is mission critical, you need to be prepared for outages in order to recover gracefully.

对于警报应用程序,最重要的一点是检测下一条警报。For alerting applications, the most important thing is to detect the next alert. 可以选择从恢复时的当前时间重启作业,并忽略以往的警报。You may choose to restart the job from the current time when recovering, ignoring past alerts. 作业启动时间语义是根据第一个输出时间而不是第一个输入时间确定的。The job start time semantics are by the first output time, not the first input time. 输入将后退适当的一段时间,以保证指定时间的第一个输出完成且正确。The input is rewound backwards an appropriate amount of time to guarantee the first output at the specified time is complete and correct. 因此,你不会收到部分聚合并意外触发警报。You won't get partial aggregates and trigger alerts unexpectedly as a result.

还可以选择从过去的某个时间段启动输出。You may also choose to start output from some amount of time in the past. 事件中心和 IoT 中心的保留策略会保留合理的数据量,以便能够从过去的时间进行处理。Both Event Hubs and IoT Hub's retention policies hold a reasonable amount of data to allow processing from the past. 要权衡的利弊是能够以最多的速度赶上当前时间,并及时开始生成新警报。The tradeoff is how fast you can catch up to the current time and start to generate timely new alerts. 数据很快就会失去其价值,因此,快速赶上当前时间非常重要。Data loses its value rapidly over time, so it's important to catch up to the current time quickly. 可通过两种方式快速赶上进度:There are two ways to catch up quickly:

  • 赶上进度时预配更多的资源 (SU)Provision more resources (SU) when catching up.
  • 从当前时间重启。Restart from current time.

从当前时间重启会很简单,弊端是处理期间会留下间隙。Restarting from current the time is simple to do, with the tradeoff of leaving a gap during processing. 以这种方式重启对于警报方案可能不会有问题,但在仪表板方案中可能会有问题,而对于存档和数据仓库方案,它根本不会成功。Restarting this way might be OK for alerting scenarios, but can be problematic for dashboard scenarios and is a non-starter for archiving and data warehousing scenarios.

预配更多资源可以加速处理,处理速度激增造成的影响非常复杂。Provisioning more resources can speed up the process, but the effect of having a processing rate surge is complex.

  • 请测试作业是否可缩放到更多的 SU。Test that your job is scalable to a larger number of SUs. 并非所有查询都可缩放。Not all queries are scalable. 需确保查询已并行化You need to make sure your query is parallelized.

  • 确保上游事件中心或 IoT 中心内有足够的分区,以便可以添加更多的吞吐量单位 (TU) 来增大输入吞吐量。Make sure there are enough partitions in the upstream Event Hubs or IoT Hub that you can add more Throughput Units (TUs) to scale the input throughput. 请记住,每个事件中心 TU 可以应对的最大输出速率为 2 MB/秒。Remember, each Event Hubs TU maxes out at an output rate of 2 MB/s.

  • 确保在输出接收器(例如 SQL 数据库、Cosmos DB)中预配足够的资源,以免对输出激增造成限制(这有时会导致系统锁定)。Make sure you have provisioned enough resources in the output sinks (i.e., SQL Database, Cosmos DB), so they don't throttle the surge in output, which can sometimes cause the system to lock up.

最重要的是预测处理速率的变化,在投入生产之前测试这些方案,并准备好在故障恢复期间正确缩放处理能力。The most important thing is to anticipate the processing rate change, test these scenarios before going into production, and be ready to scale the processing correctly during failure recovery time.

在极端情况下,如果传入的事件全部延迟,并且你对作业应用了延迟抵达时限,则可能会删除所有延迟的事件In the extreme scenario that incoming events are all delayed, it's possible all the delayed events are dropped if you have applied a late arriving window to your job. 最初,删除事件看上去像是一种诡异的行为;但是,在考虑到流分析是一种实时处理引擎的情况下,预期事件传入时间与挂钟时间接近。The dropping of the events may appear to be a mysterious behavior at the beginning; however, considering Stream Analytics is a real-time processing engine, it expects incoming events to be close to the wall clock time. 它必须删除违反这些约束的事件。It has to drop events that violate these constraints.

Lambda 体系结构或回填过程Lambda Architectures or Backfill process

幸运的是,前面的数据存档模式可用于正常处理这些滞后的事件。Fortunately, the previous data archiving pattern can be used to process these late events gracefully. 思路是,存档作业会在抵达时处理传入事件,并在事件时间将事件存档到 Azure Blob 或 Azure Data Lake Store 中适当的时间桶。The idea is that the archiving job processes incoming events in arrival time and archives events into the right time bucket in Azure Blob or Azure Data Lake Store with their event time. 事件在多晚的时间抵达并不重要,它永远不会被删除。It doesn't matter how late an event arrives, it will never be dropped. 它始终会进入适当的时间桶。It will always land in the right time bucket. 在恢复过程中,可以重新处理已存档的事件,并将结果回填到所选的存储。During recovery, it's possible to reprocess the archived events and backfill the results to the store of choice. 这类似于 lambda 模式的实现方式。This is similar to how lambda patterns are implemented.

ASA 回填

必须使用脱机批处理系统完成回填过程,该系统的编程模型很可能与 Azure 流分析不同。The backfill process has to be done with an offline batch processing system, which most likely has a different programming model than Azure Stream Analytics. 这意味着,必须重新实现整个处理逻辑。This means you have to re-implement the entire processing logic.

对于回填,至少应暂时在输出接收器中预配更多资源来处理比稳定态处理更高的吞吐量,这一点仍很重要。For backfilling, it's still important to at least temporarily provision more resource to the output sinks to handle higher throughput than the steady state processing needs.

方案Scenarios 仅从现在重启Restart from now only 从上次停止时间重启Restart from last stopped time 从现在重启,并使用存档的事件回填Restart from now + backfill with archived events
仪表板Dashboarding 产生间隙Creates gap 容许短时间的中断OK for short outage 在长时间中断时使用Use for long outage
警报Alerting 可接受Acceptable 容许短时间的中断OK for short outage 不必要Not necessary
事件寻源应用Event sourcing app 可接受Acceptable 容许短时间的中断OK for short outage 在长时间中断时使用Use for long outage
数据仓库Data warehousing 数据丢失Data loss 可接受Acceptable 不必要Not necessary
脱机分析Offline analytics 数据丢失Data loss 可接受Acceptable 不必要Not necessary

汇总Putting it all together

不难想象,可将上述所有解决方案模式一起组合到一个复杂的端到端系统中。It's not hard to imagine that all the solution patterns mentioned above can be combined together in a complex end-to-end system. 该组合系统可以包括仪表板、警报、事件寻源应用程序、数据仓库和脱机分析功能。The combined system can include dashboards, alerting, event sourcing application, data warehousing, and offline analytics capabilities.

关键是在可组合的模式中设计系统,以便可以单独构建、测试、升级和恢复每个子系统。The key is to design your system in composable patterns, so each subsystem can be built, tested, upgraded, and recover independently.

后续步骤Next steps

现在,你已了解使用 Azure 流分析的各种解决方案模式。You now have seen a variety of solution patterns using Azure Stream Analytics. 接下来,你可以进行深入了解并创建第一个流分析作业:Next, you can dive deep and create your first Stream Analytics job: