Azure 流分析查询的故障排除Troubleshoot Azure Stream Analytics queries

本文介绍开发流分析查询的常见问题以及如何进行故障排除。This article describes common issues with developing Stream Analytics queries and how to troubleshoot them.

本文介绍了编写 Azure 流分析查询时遇到的常见问题,以及如何排查和更正查询问题。This article describes common issues with developing Azure Stream Analytics queries, how to troubleshoot query issues, and how to correct the issues. 许多故障排除步骤都需要为流分析作业启用资源日志。Many troubleshooting steps require resource logs to be enabled for your Stream Analytics job.

查询未生成预期输出Query is not producing expected output

  1. 通过本地测试检查错误:Examine errors by testing locally:

    • 在 Azure 门户的“查询”选项卡上,选择“测试” 。On Azure portal, on the Query tab, select Test. 使用下载的示例数据测试查询Use the downloaded sample data to test the query. 检查并尝试修正所有错误。Examine any errors and attempt to correct them.
  2. 如果使用了 Timestamp By,请验证事件的时间戳是否大于作业开始时间If you use Timestamp By, verify that the events have timestamps greater than the job start time.

  3. 避免常犯的错误,例如:Eliminate common pitfalls, such as:

    • 查询中的一个 WHERE 子句筛选掉了所有事件,从而阻止生成输出。A WHERE clause in the query filtered out all events, preventing any output from being generated.
    • CAST 函数失败,导致作业失败。A CAST function fails, causing the job to fail. 为了避免类型强制转换失败,请改用 TRY_CASTTo avoid type cast failures, use TRY_CAST instead.
    • 使用窗口函数时,请等待整个窗口持续时间完成,以查看查询中的输出。When you use window functions, wait for the entire window duration to see an output from the query.
    • 事件时间戳先于作业开始时间,事件被删除。The timestamp for events precedes the job start time and events are dropped.
    • JOIN 条件不匹配。JOIN conditions don't match. 如果没有匹配,则输出为零。If there are no matches, there will be zero output.
  4. 确保按预期方式配置事件排序策略。Ensure event ordering policies are configured as expected. 转到“设置”,选择“事件排序” 。Go to Settings and select Event Ordering. 使用“测试”按钮测试查询时,不会应用此策略。The policy is not applied when you use the Test button to test the query. 这是在浏览器中测试与在生产中运行作业之间的一个差别。This result is one difference between testing in-browser versus running the job in production.

  5. 使用活动和资源日志进行调试:Debug by using activity and resource logs:

资源利用率高Resource utilization is high

确保利用 Azure 流分析中的并行化。Ensure you take advantage of parallelization in Azure Stream Analytics. 可以学习通过配置输入分区和调整分析查询定义来使用查询并行化对流分析作业进行缩放You can learn to scale with query parallelization of Stream Analytics jobs by configuring input partitions and tuning the analytics query definition.

逐步调试查询Debug queries progressively

在实时数据处理中,掌握查询过程中数据的状态是十分有用的。In real-time data processing, knowing what the data looks like in the middle of the query can be helpful. 可以使用 Visual Studio 中的作业关系图来查看此状态。You can see this using the job diagram in Visual Studio. 如果没有 Visual Studio,可以执行其他步骤来输出中间数据。If you don't have Visual Studio, you can take additional steps to output intermediate data.

由于可以多次读取 Azure 流分析作业的输入或步骤,因此可以编写额外的 SELECT INTO 语句。Because inputs or steps of an Azure Stream Analytics job can be read multiple times, you can write extra SELECT INTO statements. 这样做会将中间数据输出至存储,并允许你检查数据的正确性,就如调试程序时的监视变量一样。Doing so outputs intermediate data into storage and lets you inspect the correctness of the data, just as watch variables do when you debug a program.

下列 Azure 流分析作业中的示例查询具有一个流输入、两个引用数据输入和一个向 Azure 表存储的输出。The following example query in an Azure Stream Analytics job has one stream input, two reference data inputs, and an output to Azure Table Storage. 查询联接数据中心和两个引用 Blob 中的数据,以获取名称和类别信息:The query joins data from the event hub and two reference blobs to get the name and category information:

流分析 SELECT INTO 查询示例

请注意,虽然作业正在运行,但在输出中未生成任何事件。Note that the job is running, but no events are being produced in the output. 在“监视”磁贴上,可以看见输入正在生成数据,但不知道 JOIN 的哪个步骤导致所有事件被删除。On the Monitoring tile, shown here, you can see that the input is producing data, but you don't know which step of the JOIN caused all the events to be dropped.

流分析监视磁贴

在此情况下,可添加几个额外的 SELECT INTO 语句,用于“记录”中间 JOIN 结果,以及从输入中读取的数据。In this situation, you can add a few extra SELECT INTO statements to "log" the intermediate JOIN results and the data that's read from the input.

此示例中添加了两个新的“临时输出”。In this example, we've added two new "temporary outputs." 可任意选择你喜欢的接收器。They can be any sink you like. 此处使用 Azure 存储作为示例:Here we use Azure Storage as an example:

向流分析查询添加额外的 SELECT INTO 语句

然后,可以重写查询,如下所示:You can then rewrite the query like this:

重写 SELECT INTO 流分析查询

现在再次启动作业,并运行数分钟。Now start the job again, and let it run for a few minutes. 查询 temp1 和 temp2 通过 Visual Studio 云资源管理器生成下列各表:Then query temp1 and temp2 with Visual Studio Cloud Explorer to produce the following tables:

temp1 表 SELECT INTO temp1 表流分析查询temp1 table SELECT INTO temp1 table Stream Analytics query

temp2 表 SELECT INTO temp2 表流分析查询temp2 table SELECT INTO temp2 table Stream Analytics query

可以看到,temp1 和 temp2 都拥有数据,且 temp2 中正确填充了名称列。As you can see, temp1 and temp2 both have data, and the name column is populated correctly in temp2. 但是,由于输出中没有数据,因此存在问题:However, because there is still no data in output, something is wrong:

SELECT INTO output1 表不包含数据流分析查询

通过数据采样,几乎可以确定此问题与第二个 JOIN 有关。By sampling the data, you can be almost certain that the issue is with the second JOIN. 可以从 Blob 下载并查看引用数据:You can download the reference data from the blob and take a look:

SELECT INTO ref 表流分析查询

可以看到,此引用数据中的 GUID 的格式与 temp2 中 [来自] 列的格式不同。As you can see, the format of the GUID in this reference data is different from the format of the [from] column in temp2. 这就是数据无法按预期到达 output1 的原因。That's why the data didn't arrive in output1 as expected.

可以修复数据格式,将其上传至引用 Blob,然后再重新尝试:You can fix the data format, upload it to reference blob, and try again:

SELECT INTO temp 表流分析查询

此时,输出中的数据按预期格式化和填充。This time, the data in the output is formatted and populated as expected.

SELECT INTO final 表流分析查询

获取帮助Get help

如需获取进一步的帮助,可前往 Azure 流分析的 Microsoft 问答页面For further assistance, try our Microsoft Q&A question page for Azure Stream Analytics.

后续步骤Next steps