利用 Azure 流分析中的查询并行化Leverage query parallelization in Azure Stream Analytics

本文说明了如何利用 Azure 流分析中的并行化。This article shows you how to take advantage of parallelization in Azure Stream Analytics. 了解如何通过配置输入分区和调整分析查询定义来缩放流分析作业。You learn how to scale Stream Analytics jobs by configuring input partitions and tuning the analytics query definition. 作为先决条件,建议先熟悉了解并调整流式处理单位中所述的流式处理单位的概念。As a prerequisite, you may want to be familiar with the notion of Streaming Unit described in Understand and adjust Streaming Units.

流分析作业的组成部分有哪些?What are the parts of a Stream Analytics job?

流分析作业定义包括至少一个流式处理输入、查询和输出。A Stream Analytics job definition includes at least one streaming input, a query, and output. 输入是作业读取数据流的地方。Inputs are where the job reads the data stream from. 查询是用于转换数据输入流的一种方式,而输出则是作业将作业结果发送到的地方。The query is used to transform the data input stream, and the output is where the job sends the job results to.

输入和输出中的分区Partitions in inputs and outputs

利用分区,可根据分区键将数据分为多个子集。Partitioning lets you divide data into subsets based on a partition key. 如果按某个键对输入(例如,事件中心)进行分区,强烈建议在向流分析作业添加输入时指定此分区键。If your input (for example Event Hubs) is partitioned by a key, it is highly recommended to specify this partition key when adding input to your Stream Analytics job. 缩放流分析作业时,可利用输入和输出中的分区。Scaling a Stream Analytics job takes advantage of partitions in the input and output. 流分析作业可以并行使用和写入不同的分区,从而增加吞吐量。A Stream Analytics job can consume and write different partitions in parallel, which increases throughput.

输入Inputs

所有 Azure 流分析输入都可以利用分区:All Azure Stream Analytics input can take advantage of partitioning:

  • EventHub(如果使用兼容性级别 1.1 或更低级别,则需要使用 PARTITION BY 关键字显式设置分区键)EventHub (need to set the partition key explicitly with PARTITION BY keyword if using compatibility level 1.1 or below)
  • IoT 中心(如果使用兼容性级别 1.1 或更低级别,则需要使用 PARTITION BY 关键字显式设置分区键)IoT Hub (need to set the partition key explicitly with PARTITION BY keyword if using compatibility level 1.1 or below)
  • Blob 存储Blob storage

OutputsOutputs

处理流分析时,可利用输出中的分区:When you work with Stream Analytics, you can take advantage of partitioning in the outputs:

  • Azure FunctionsAzure Functions
  • Azure 表Azure Table
  • Blob 存储(可显式设置分区键)Blob storage (can set the partition key explicitly)
  • Cosmos DB(需显式设置分区键)Cosmos DB (need to set the partition key explicitly)
  • 事件中心(需显式设置分区键)Event Hubs (need to set the partition key explicitly)
  • IoT 中心(需显式设置分区键)IoT Hub (need to set the partition key explicitly)
  • 服务总线Service Bus
  • 使用可选分区的 SQL 和 SQL 数据仓库:请在“输出到 Azure SQL 数据库”页中查看详细信息。SQL and SQL Data Warehouse with optional partitioning: see more information on the Output to Azure SQL Database page.

Power BI 不支持分区。Power BI doesn't support partitioning. 但仍可对输入进行分区,如本节中所述However you can still partition the input as described in this section

若要深入了解分区,请参阅以下文章:For more information about partitions, see the following articles:

易并行作业Embarrassingly parallel jobs

易并行作业是 Azure 流分析中最具可伸缩性的方案。An embarrassingly parallel job is the most scalable scenario in Azure Stream Analytics. 它将查询的一个实例的输入的一个分区连接到输出的一个分区。It connects one partition of the input to one instance of the query to one partition of the output. 实现此并行需满足以下要求:This parallelism has the following requirements:

  1. 如果查询逻辑取决于同一个查询实例正在处理的相同键,则必须确保事件转到输入的同一个分区。If your query logic depends on the same key being processed by the same query instance, you must make sure that the events go to the same partition of your input. 对于事件中心或 IoT 中心,这意味着事件数据必须已设置 PartitionKey 值。For Event Hubs or IoT Hub, this means that the event data must have the PartitionKey value set. 或者,可以使用已分区的发件人。Alternatively, you can use partitioned senders. 对于 Blob 存储,这意味着事件将发送到相同的分区文件夹。For blob storage, this means that the events are sent to the same partition folder. 例如,按 userID 聚合数据的查询实例使用 userID 作为分区键来对输入事件中心进行分区。An example would be a query instance that aggregates data per userID where input event hub is partitioned using userID as partition key. 但是,如果查询逻辑不要求由同一查询实例处理相同的键,则可忽略此要求。However, if your query logic does not require the same key to be processed by the same query instance, you can ignore this requirement. 举例来说,简单的选择项目筛选器查询就体现了此逻辑。An example of this logic would be a simple select-project-filter query.

  2. 下一步是对查询进行分区。The next step is to make your query is partitioned. 对于兼容性级别为 1.2 或更高级别(推荐)的作业,可以在输入设置中将自定义列指定为分区键,作业将自动并行化。For jobs with compatibility level 1.2 or higher (recommended), custom column can be specified as Partition Key in the input settings and the job will be paralellized automatically. 兼容性级别为 1.0 或 1.1 的作业要求在所有查询步骤中使用“PARTITION BY PartitionId”。Jobs with compatibility level 1.0 or 1.1, requires you to use PARTITION BY PartitionId in all the steps of your query. 允许采用多个步骤,但必须使用相同的键对其进行分区。Multiple steps are allowed, but they all must be partitioned by the same key.

  3. 流分析中支持的大部分输出可利用分区。Most of the outputs supported in Stream Analytics can take advantage of partitioning. 如果使用不支持分区的输出类型,则作业不会易并行。If you use an output type that doesn't support partitioning your job won't be embarrassingly parallel. 对于事件中心输出,请确保将“分区键列”设置为查询中使用的同一分区键。For Event Hub outputs, ensure Partition key column is set to the same partition key used in the query. 有关详细信息,请参阅输出部分Refer to the output section for more details.

  4. 输入分区数必须等于输出分区数。The number of input partitions must equal the number of output partitions. Blob 存储输出可支持分区,并继承上游查询的分区方案。Blob storage output can support partitions and inherits the partitioning scheme of the upstream query. 指定 Blob 存储的分区键时,数据将按每个输入分区进行分区,因此结果仍然是并行的。When a partition key for Blob storage is specified, data is partitioned per input partition thus the result is still fully parallel. 下面是允许完全并行作业的分区值示例:Here are examples of partition values that allow a fully parallel job:

    • 8 个事件中心输入分区和 8 个事件中心输出分区8 event hub input partitions and 8 event hub output partitions
    • 8 个事件中心输入分区和 Blob 存储输出8 event hub input partitions and blob storage output
    • 8 个事件中心输入分区和 Blob 存储输出由具有任意基数的自定义字段进行分区8 event hub input partitions and blob storage output partitioned by a custom field with arbitrary cardinality
    • 8 个 Blob 存储输入分区和 Blob 存储输出8 blob storage input partitions and blob storage output
    • 8 个 Blob 存储输入分区和 8 个事件中心输出分区8 blob storage input partitions and 8 event hub output partitions

以下部分介绍一些易并行的示例方案。The following sections discuss some example scenarios that are embarrassingly parallel.

简单查询Simple query

  • 输入:具有 8 个分区的事件中心Input: Event hub with 8 partitions
  • 输出:具有 8 个分区的事件中心(必须将“分区键列”设置为使用“PartitionId”)Output: Event hub with 8 partitions ("Partition key column" must be set to use "PartitionId")

查询:Query:

    --Using compatibility level 1.2 or above
    SELECT TollBoothId
    FROM Input1
    WHERE TollBoothId > 100
    
    --Using compatibility level 1.0 or 1.1
    SELECT TollBoothId
    FROM Input1 PARTITION BY PartitionId
    WHERE TollBoothId > 100

此查询是一个简单的筛选器。This query is a simple filter. 因此,无需担心对发送到事件中心的输入进行分区。Therefore, we don't need to worry about partitioning the input that is being sent to the event hub. 请注意,使用版本低于 1.2 的兼容性级别的作业必须包含 PARTITION BY PartitionId 子句才能满足上述要求 #2。Notice that jobs with compatibility level before 1.2 must include PARTITION BY PartitionId clause, so it fulfills requirement #2 from earlier. 对于输出,需要在作业中配置事件中心输出,将分区键设置为“PartitionId”。For the output, we need to configure the event hub output in the job to have the partition key set to PartitionId. 最后一项检查是确保输入分区数等于输出分区数。One last check is to make sure that the number of input partitions is equal to the number of output partitions.

带分组键的查询Query with a grouping key

  • 输入:具有 8 个分区的事件中心Input: Event hub with 8 partitions
  • 输出:Blob 存储Output: Blob storage

查询:Query:

    --Using compatibility level 1.2 or above
    SELECT COUNT(*) AS Count, TollBoothId
    FROM Input1
    GROUP BY TumblingWindow(minute, 3), TollBoothId
    
    --Using compatibility level 1.0 or 1.1
    SELECT COUNT(*) AS Count, TollBoothId
    FROM Input1 Partition By PartitionId
    GROUP BY TumblingWindow(minute, 3), TollBoothId, PartitionId

此查询具有分组键。This query has a grouping key. 因此,组合在一起的事件必须被发送到相同事件中心分区。Therefore, the events grouped together must be sent to the same Event Hub partition. 由于在此示例中我们按 TollBoothID 进行分组,因此应确保在将事件发送到事件中心时,将 TollBoothID 用作分区键。Since in this example we group by TollBoothID, we should be sure that TollBoothID is used as the partition key when the events are sent to Event Hub. 然后在 ASA 中,可以使用 PARTITION BY PartitionId 继承此分区方案并启用完全并行化。Then in ASA, we can use PARTITION BY PartitionId to inherit from this partition scheme and enable full parallelization. 由于输出是 Blob 存储,因此如要求 #4 所述,无需担心如何配置分区键值。Since the output is blob storage, we don't need to worry about configuring a partition key value, as per requirement #4.

不易并行的示例方案Example of scenarios that are not embarrassingly parallel

上一节介绍了一些易并行方案。In the previous section, we showed some embarrassingly parallel scenarios. 本部分介绍不满足实现易并行所需全部要求的方案。In this section, we discuss scenarios that don't meet all the requirements to be embarrassingly parallel.

分区计数不匹配Mismatched partition count

  • 输入:具有 8 个分区的事件中心Input: Event hub with 8 partitions
  • 输出:具有 32 个分区的事件中心Output: Event hub with 32 partitions

如果输入分区数不等于输出分区数,则无论是什么查询,拓扑都不会易并行。If the input partition count doesn't match the output partition count, the topology isn't embarrassingly parallel irrespective of the query. 但是,我们仍然可以实现一定程度的并行化。However we can still get some level or parallelization.

使用非分区输出进行查询Query using non-partitioned output

  • 输入:具有 8 个分区的事件中心Input: Event hub with 8 partitions
  • 输出:Power BIOutput: Power BI

Power BI 输出当前不支持分区。Power BI output doesn't currently support partitioning. 因此,此方案不易并行。Therefore, this scenario is not embarrassingly parallel.

使用不同 PARTITION BY 值进行多步骤查询Multi-step query with different PARTITION BY values

  • 输入:具有 8 个分区的事件中心Input: Event hub with 8 partitions
  • 输出:具有 8 个分区的事件中心Output: Event hub with 8 partitions
  • 兼容性级别:1.0 或 1.1Compatibility level: 1.0 or 1.1

查询:Query:

    WITH Step1 AS (
    SELECT COUNT(*) AS Count, TollBoothId, PartitionId
    FROM Input1 Partition By PartitionId
    GROUP BY TumblingWindow(minute, 3), TollBoothId, PartitionId
    )

    SELECT SUM(Count) AS Count, TollBoothId
    FROM Step1 Partition By TollBoothId
    GROUP BY TumblingWindow(minute, 3), TollBoothId

正如所见,第二步使用 TollBoothId 作为分区键。As you can see, the second step uses TollBoothId as the partitioning key. 此步骤与第一步不同,因此需要执行随机选择。This step is not the same as the first step, and it therefore requires us to do a shuffle.

使用不同 PARTITION BY 值进行多步骤查询Multi-step query with different PARTITION BY values

  • 输入:具有 8 个分区的事件中心Input: Event hub with 8 partitions
  • 输出:具有 8 个分区的事件中心(“分区键列”必须设置为使用“TollBoothId”)Output: Event hub with 8 partitions ("Partition key column" must be set to use "TollBoothId")
  • 兼容性级别 - 1.2 或更高级别Compatibility level - 1.2 or above

查询:Query:

    WITH Step1 AS (
    SELECT COUNT(*) AS Count, TollBoothId
    FROM Input1
    GROUP BY TumblingWindow(minute, 3), TollBoothId
    )

    SELECT SUM(Count) AS Count, TollBoothId
    FROM Step1
    GROUP BY TumblingWindow(minute, 3), TollBoothId

默认情况下,兼容性级别 1.2 或更高级别支持并行查询执行。Compatibility level 1.2 or above enables parallel query execution by default. 例如,只要将“TollBoothId”列设置为输入分区键,就可对上一部分中的查询进行分区。For example, query from the previous section will be partitioned as long as "TollBoothId" column is set as input Partition Key. 不需要 PARTITION BY PartitionId 子句。PARTITION BY PartitionId clause is not required.

计算作业的最大流式处理单位数Calculate the maximum streaming units of a job

流分析作业所能使用的流式处理单位总数取决于为作业定义的查询中的步骤数,以及每一步的分区数。The total number of streaming units that can be used by a Stream Analytics job depends on the number of steps in the query defined for the job and the number of partitions for each step.

查询中的步骤Steps in a query

查询可以有一个或多个步骤。A query can have one or many steps. 每一步都是由 WITH 关键字定义的子查询。Each step is a subquery defined by the WITH keyword. 位于 WITH 关键字外的查询(仅 1 个查询)也计为一步,例如以下查询中的 SELECT 语句:The query that is outside the WITH keyword (one query only) is also counted as a step, such as the SELECT statement in the following query:

查询:Query:

    WITH Step1 AS (
        SELECT COUNT(*) AS Count, TollBoothId
        FROM Input1 Partition By PartitionId
        GROUP BY TumblingWindow(minute, 3), TollBoothId, PartitionId
    )
    SELECT SUM(Count) AS Count, TollBoothId
    FROM Step1
    GROUP BY TumblingWindow(minute,3), TollBoothId

此查询有两步。This query has two steps.

备注

本文后面部分将详细介绍此查询。This query is discussed in more detail later in the article.

对步骤进行分区Partition a step

对步骤进行分区需要下列条件:Partitioning a step requires the following conditions:

  • 输入源必须进行分区。The input source must be partitioned.
  • 查询的 SELECT 语句必须从进行了分区的输入源读取。The SELECT statement of the query must read from a partitioned input source.
  • 步骤中的查询必须有 PARTITION BY 关键字。The query within the step must have the PARTITION BY keyword.

对查询进行分区后,需在独立的分区组中处理和聚合输入事件,并为每个组生成输出事件。When a query is partitioned, the input events are processed and aggregated in separate partition groups, and outputs events are generated for each of the groups. 如果需要对聚合进行组合,则必须创建另一个不分区的步骤来进行聚合。If you want a combined aggregate, you must create a second non-partitioned step to aggregate.

计算作业的最大流式处理单位数Calculate the max streaming units for a job

所有未分区的步骤总共可将每个流分析作业扩展到最多 6 个流式处理单元 (SU)。All non-partitioned steps together can scale up to six streaming units (SUs) for a Stream Analytics job. 除此之外,可以在分区步骤中为每个分区添加 6 个 SU。In addition to this, you can add 6 SUs for each partition in a partitioned step. 下表是一些示例。You can see some examples in the table below.

查询Query 作业的最大 SU 数Max SUs for the job
  • 该查询包含一个步骤。The query contains one step.
  • 该步骤未分区。The step is not partitioned.
66
  • 输入数据流被分为 16 个分区。The input data stream is partitioned by 16.
  • 该查询包含一个步骤。The query contains one step.
  • 该步骤已分区。The step is partitioned.
96(6 * 16 个分区)96 (6 * 16 partitions)
  • 该查询包含两个步骤。The query contains two steps.
  • 这两个步骤都未分区。Neither of the steps is partitioned.
66
  • 输入数据流被分为 3 个分区。The input data stream is partitioned by 3.
  • 该查询包含两个步骤。The query contains two steps. 输入步骤进行了分区,第二个步骤未分区。The input step is partitioned and the second step is not.
  • SELECT 语句从已分区输入中读取数据。The SELECT statement reads from the partitioned input.
24(18 个用于已分区步骤 + 6 个用于未分区步骤)24 (18 for partitioned steps + 6 for non-partitioned steps

缩放示例Examples of scaling

以下查询计算 3 分钟时段内通过收费站(共 3 个收费亭)的车辆数。The following query calculates the number of cars within a three-minute window going through a toll station that has three tollbooths. 此查询可增加到 6 个 SU。This query can be scaled up to six SUs.

    SELECT COUNT(*) AS Count, TollBoothId
    FROM Input1
    GROUP BY TumblingWindow(minute, 3), TollBoothId, PartitionId

若要对查询使用更多 SU,必须对输入数据流和查询进行分区。To use more SUs for the query, both the input data stream and the query must be partitioned. 由于数据流分区设置为 3,因此可将以下经修改的查询增加到 18 个 SU:Since the data stream partition is set to 3, the following modified query can be scaled up to 18 SUs:

    SELECT COUNT(*) AS Count, TollBoothId
    FROM Input1 Partition By PartitionId
    GROUP BY TumblingWindow(minute, 3), TollBoothId, PartitionId

对查询进行分区后,会在独立的分区组中处理和聚合输入事件。When a query is partitioned, the input events are processed and aggregated in separate partition groups. 此外,还会为每个组生成输出事件。Output events are also generated for each of the groups. 在输入数据流中,当 GROUP BY 字段不是分区键时,执行分区可能会导致某些意外的结果。Partitioning can cause some unexpected results when the GROUP BY field is not the partition key in the input data stream. 例如,在前面的查询中,TollBoothId 字段不是 Input1 的分区键。For example, the TollBoothId field in the previous query is not the partition key of Input1. 因此,可以将 TollBooth #1 中的数据分布到多个分区。The result is that the data from TollBooth #1 can be spread in multiple partitions.

流分析会分开处理每个 Input1 分区。Each of the Input1 partitions will be processed separately by Stream Analytics. 因此,将在相同的翻转窗口为同一收费亭创建多个关于车辆数的记录。As a result, multiple records of the car count for the same tollbooth in the same Tumbling window will be created. 如果不能更改输入分区键,则可通过添加不分区步骤以跨分区聚合值来解决此问题,如下例所示:If the input partition key can't be changed, this problem can be fixed by adding a non-partition step to aggregate values across partitions, as in the following example:

    WITH Step1 AS (
        SELECT COUNT(*) AS Count, TollBoothId
        FROM Input1 Partition By PartitionId
        GROUP BY TumblingWindow(minute, 3), TollBoothId, PartitionId
    )

    SELECT SUM(Count) AS Count, TollBoothId
    FROM Step1
    GROUP BY TumblingWindow(minute, 3), TollBoothId

此查询可增加到 24 个 SU。This query can be scaled to 24 SUs.

备注

若要联接两个流,请务必按创建联接所用列的分区键对流进行分区。If you are joining two streams, make sure that the streams are partitioned by the partition key of the column that you use to create the joins. 还需确保两个流中的分区数相同。Also make sure that you have the same number of partitions in both streams.

大规模实现更高吞吐量Achieving higher throughputs at scale

易并行作业是必要的,但并不足以大规模保持较高吞吐量。An embarrassingly parallel job is necessary but not sufficient to sustain a higher throughput at scale. 每个存储系统及其对应的流分析输出在如何尽量实现最佳写入吞吐量方面各不相同。Every storage system and its corresponding Stream Analytics output has variations on how to achieve the best possible write throughput. 对于任何大规模方案,都可以使用适当的配置来解决一些难题。As with any at-scale scenario, there are some challenges which can be solved by using the right configurations. 本部分讨论几个常见输出的配置,并通过示例来说明如何保持每秒 1K、5K 和 10K 个事件的引入速率。This section discusses configurations for a few common outputs and provides samples for sustaining ingestion rates of 1K, 5K and 10K events per second.

以下观测值使用包含无状态(直通)查询的流分析作业。该查询是一个基本的 JavaScript UDF,可将数据写入事件中心、Azure SQL 数据库或 Cosmos DB。The following observations use a Stream Analytics job with stateless (passthrough) query, a basic JavaScript UDF which writes to Event Hub, Azure SQL DB, or Cosmos DB.

事件中心Event Hub

引入速率(每秒事件数)Ingestion Rate (events per second) 流式处理单位数Streaming Units 输出资源Output Resources
1K1K 11 2 TU2 TU
5K5K 66 6 TU6 TU
10K10K 1212 10 TU10 TU

事件中心解决方案在流单元 (SU) 和吞吐量方面可线性缩放,因此,它是分析和流式传输流分析中的数据的最有效方式。The Event Hub solution scales linearly in terms of streaming units (SU) and throughput, making it the most efficient and performant way to analyze and stream data out of Stream Analytics. 作业可扩展到 192 SU,这大致相当于处理能力高达 200 MB/秒,即,每天可处理 19 万亿个事件。Jobs can be scaled up to 192 SU, which roughly translates to processing up to 200 MB/s, or 19 trillion events per day.

Azure SQLAzure SQL

引入速率(每秒事件数)Ingestion Rate (events per second) 流式处理单位数Streaming Units 输出资源Output Resources
1K1K 33 S3S3
5K5K 1818 P4P4
10K10K 3636 P6P6

Azure SQL 支持并行写入(称为继承分区),但默认不会启用此功能。Azure SQL supports writing in parallel, called Inherit Partitioning, but it's not enabled by default. 不过,结合完全并行查询启用继承分区可能并不足以实现更高的吞吐量。However, enabling Inherit Partitioning, along with a fully parallel query, may not be sufficient to achieve higher throughputs. SQL 写入吞吐量在很大程度上取决于数据库配置和表架构。SQL write throughputs depend significantly on your database configuration and table schema. SQL 输出性能一文详细介绍了可最大程度提高写入吞吐量的参数。The SQL Output Performance article has more detail about the parameters that can maximize your write throughput. 从 Azure 流分析输出到 Azure SQL 数据库一文中所述,此解决方案无法作为完全并行的管道线性扩展到 8 个分区以上,可能需要在 SQL 输出之前重新分区(请参阅 INTO)。As noted in the Azure Stream Analytics output to Azure SQL Database article, this solution doesn't scale linearly as a fully parallel pipeline beyond 8 partitions and may need repartitioning before SQL output (see INTO). 需要使用高级 SKU 来维持较高的 IO 速率,同时,每隔几分钟就会产生日志备份的开销。Premium SKUs are needed to sustain high IO rates along with overhead from log backups happening every few minutes.

Cosmos DBCosmos DB

引入速率(每秒事件数)Ingestion Rate (events per second) 流式处理单位数Streaming Units 输出资源Output Resources
1K1K 33 20K RU20K RU
5K5K 2424 60K RU60K RU
10K10K 4848 120K RU120K RU

流分析的 Cosmos DB 输出已更新为使用兼容性级别 1.2 中的本机集成。Cosmos DB output from Stream Analytics has been updated to use native integration under compatibility level 1.2. 与 1.1 相比,兼容性级别 1.2 明显提高了吞吐量,并减少了 RU 消耗,它是新作业的默认兼容性级别。Compatibility level 1.2 enables significantly higher throughput and reduces RU consumption compared to 1.1, which is the default compatibility level for new jobs. 解决方案使用 /deviceId 上分区的 CosmosDB 容器,解决方案的剩余部分可以采用相同的配置。The solution uses CosmosDB containers partitioned on /deviceId and the rest of solution is identically configured.

所有大规模流式处理 Azure 示例都使用由负载模拟测试客户端提供的事件中心作为输入。All Streaming at Scale Azure samples use an Event Hub as input that is fed by load simulating test clients. 每个输入事件是一个 1KB JSON 文档,可轻松地将配置的引入速率转化为吞吐率(1MB/秒、5MB/秒和 10MB/秒)。Each input event is a 1KB JSON document, which translates configured ingestion rates to throughput rates (1MB/s, 5MB/s and 10MB/s) easily. 事件可以模拟某个 IoT 设备来发送最多 1K 个设备的以下 JSON 数据(以简写形式显示):Events simulate an IoT device sending the following JSON data (in a shortened form) for up to 1K devices:

{
    "eventId": "b81d241f-5187-40b0-ab2a-940faf9757c0",
    "complexData": {
        "moreData0": 51.3068118685458,
        "moreData22": 45.34076957651598
    },
    "value": 49.02278128887753,
    "deviceId": "contoso://device-id-1554",
    "type": "CO2",
    "createdAt": "2019-05-16T17:16:40.000003Z"
}

备注

由于解决方案中使用了不同的组件,这些配置可能会发生更改。The configurations are subject to change due to the various components used in the solution. 若要获得更准确的估算值,请根据具体的方案自定义示例。For a more accurate estimate, customize the samples to fit your scenario.

识别瓶颈Identifying Bottlenecks

使用 Azure 流分析作业中的“指标”窗格可识别管道中的瓶颈。Use the Metrics pane in your Azure Stream Analytics job to identify bottlenecks in your pipeline. 查看针对吞吐量的“输入/输出事件”,以及“水印延迟”或“积压事件”,可以确定作业是否跟得上输入速率。 Review Input/Output Events for throughput and "Watermark Delay" or Backlogged Events to see if the job is keeping up with the input rate. 对于事件中心指标,请查看“受限制的请求数”并相应地调整阈值单位。For Event Hub metrics, look for Throttled Requests and adjust the Threshold Units accordingly. 对于 Cosmos DB 指标,请查看“吞吐量”下的“每个分区键范围的最大 RU/秒消耗量”,以确保均匀消耗分区键范围。For Cosmos DB metrics, review Max consumed RU/s per partition key range under Throughput to ensure your partition key ranges are uniformly consumed. 对于 Azure SQL 数据库,请监视“日志 IO”和“CPU”。 For Azure SQL DB, monitor Log IO and CPU.

获取帮助Get help

如需获取进一步的帮助,可前往 Azure 流分析的 Microsoft 问答页面For further assistance, try our Microsoft Q&A question page for Azure Stream Analytics.

后续步骤Next steps