为 Azure 流分析作业设置警报Set up alerts for Azure Stream Analytics jobs

必须监视 Azure 流分析作业,以确保作业持续正常运行。It's important to monitor your Azure Stream Analytics job to ensure the job is running continuously without any problems. 本文介绍如何针对要监视的常见方案设置警报。This article describes how to set up alerts for common scenarios that should be monitored.

可以通过门户或以编程方式基于操作日志数据定义指标的规则。You can define rules on metrics from Operation Logs data through the portal, as well as programmatically.

在 Azure 门户中设置警报Set up alerts in the Azure portal

作业意外停止时将收到警报Get alerted when a job stops unexpectedly

以下示例演示如何针对作业进入失败状态设置警报。The following example demonstrates how to set up alerts for when your job enters a failed state. 建议对所有作业设置此警报。This alert is recommended for all jobs.

  1. 在 Azure 门户中,打开要为其创建警报的流分析作业。In the Azure portal, open the Stream Analytics job you want to create an alert for.

  2. 在“作业”页上,导航到“监视”部分。 On the Job page, navigate to the Monitoring section.

  3. 选择“指标”,然后选择“新建警报规则”。 Select Metrics, and then New alert rule.

    Azure 门户流分析警报设置

  4. 流分析作业名称应会自动显示在“资源”下。Your Stream Analytics job name should automatically appear under RESOURCE. 单击“添加条件”,然后选择“配置信号逻辑”下的“所有管理操作”。 Click Add condition, and select All Administrative operations under Configure signal logic.

    选择流分析警报的信号名称

  5. 在“配置信号逻辑”下,将“事件级别”更改为“所有”,将“状态”更改为“失败”。 Under Configure signal logic, change Event Level to All and change Status to Failed. 将“事件发起者”保留空白,然后选择“完成”。 Leave Event initiated by blank and select Done.

    配置流分析警报的信号逻辑

  6. 选择现有的操作组或创建新组。Select an existing action group or create a new group. 本示例创建了名为 TIDashboardGroupActions 的新操作组,其中包含一个“电子邮件”操作,该操作可将电子邮件发送到具有“所有者”Azure 资源管理器角色的用户。 In this example, a new action group called TIDashboardGroupActions was created with an Emails action that sends an email to users with the Owner Azure Resource Manager Role.

    为 Azure 流分析作业设置警报

  7. “资源”、“条件”和“操作组”都应该有对应的条目。 The RESOURCE, CONDITION, and ACTION GROUPS should each have an entry. 请注意,为了触发警报,需要满足所定义的条件。Note that in order for the alerts to fire, the conditions defined need to be met. 例如,可以每 5 分钟检测一次某个指标在过去 15 分钟的平均值。For example, you can measure a metric's average value of over the last 15 minutes, every 5 minutes.

    创建流分析警报规则

    在“警报详细信息”中添加警报规则名称说明资源组,然后单击“创建警报规则”创建流分析作业的规则。 Add an Alert rule name, Description, and your Resource Group to the ALERT DETAILS and click Create alert rule to create the rule for your Stream Analytics job.

    创建流分析警报规则

要监视的方案Scenarios to monitor

建议监视以下警报,以了解流分析作业的性能。The following alerts are recommended for monitoring the performance of your Stream Analytics job. 在过去 5 分钟时段内,应每隔一分钟评估这些指标。These metrics should be evaluated every minute over the last 5-minute period.

指标Metric 条件Condition 时间聚合Time Aggregation 阈值Threshold 纠正措施Corrective Actions
SU% 利用率SU% Utilization 大于Greater than 最大值Maximum 8080 有多个因素可以提高 SU% 利用率。There are multiple factors that increase SU% Utilization. 可以使用查询并行化进行缩放,或者增加流单元数。You can scale with query parallelization or increase the number of streaming units. 有关详细信息,请参阅利用 Azure 流分析中的查询并行化For more information, see Leverage query parallelization in Azure Stream Analytics.
运行时错误Runtime errors 大于Greater than 总计Total 00 检查活动或资源日志,并对输入、查询或输出进行相应更改。Examine the activity or resource logs and make appropriate changes to the inputs, query, or outputs.
水印延迟Watermark delay 大于Greater than 最大值Maximum 当此指标在过去 15 分钟的平均值大于延迟容限(以秒为单位)时。When average value of this metric over the last 15 minutes is greater than late arrival tolerance (in seconds). 如果未修改延迟容限,默认值将设置为 5 秒。If you have not modified the late arrival tolerance, the default is set to 5 seconds. 尝试增加 SU 数量或将查询并行化。Try increasing the number of SUs or parallelizing your query. 有关 SU 的详细信息,请参阅了解和调整流单元For more information on SUs, see Understand and adjust Streaming Units. 有关并行化查询的详细信息,请参阅利用 Azure 流分析中的查询并行化For more information on parallelizing your query, see Leverage query parallelization in Azure Stream Analytics.
输入反序列化错误Input deserialization errors 大于Greater than 总计Total 00 检查活动或资源日志,并对输入进行相应更改。Examine the activity or resource logs and make appropriate changes to the input.

后续步骤Next steps