Azure 流分析的输出Outputs from Azure Stream Analytics

Azure 流分析作业由输入、查询和输出构成。An Azure Stream Analytics job consists of an input, query, and an output. 可以将转换后的数据发送到多个输出类型。There are several output types to which you can send transformed data. 本文列出了支持的流分析输出。This article lists the supported Stream Analytics outputs. 设计流分析查询时,使用 INTO 子句引用输出的名称。When you design your Stream Analytics query, refer to the name of the output by using the INTO clause. 可针对每个作业使用单个输出,也可通过向查询添加多个 INTO 子句,针对每个流式处理作业使用多个输出(如果需要)。You can use a single output per job, or multiple outputs per streaming job (if you need them) by adding multiple INTO clauses to the query.

要创建、编辑和测试流分析作业输出,可使用 Azure 门户Azure PowerShell.NET APIREST APITo create, edit, and test Stream Analytics job outputs, you can use the Azure portal, Azure PowerShell, .NET API, and REST API.

部分输出类型支持分区,并且输出批大小可变化以优化吞吐量。Some outputs types support partitioning, and output batch sizes vary to optimize throughput. 下表显示了每种输出类型支持的功能:The following table shows features that are supported for each output type:

输出类型Output type 分区Partitioning 安全Security
Azure SQL 数据库Azure SQL Database 是,可选。Yes, optional. SQL 用户身份验证,SQL user auth,
托管标识(预览版)Managed Identity (preview)
Blob 存储和 Azure Data Lake Gen 2Blob storage and Azure Data Lake Gen 2 Yes 访问密钥,Access key,
托管标识(预览版)Managed Identity (preview)
Azure 事件中心Azure Event Hubs 是,需要在输出配置中设置分区键列。Yes, need to set the partition key column in output configuration. 访问密钥,Access key,
托管标识(预览版)Managed Identity (preview)
Azure 表存储Azure Table storage Yes 帐户密钥Account key
Azure 服务总线队列Azure Service Bus queues Yes 访问密钥Access key
Azure 服务总线主题Azure Service Bus topics Yes 访问密钥Access key
Azure Cosmos DBAzure Cosmos DB Yes 访问密钥Access key
Azure FunctionsAzure Functions Yes 访问密钥Access key

分区Partitioning

流分析支持上述所有输出的分区。Stream Analytics supports partitions for all outputs above. 有关分区键和输出编写器数目的详细信息,请参阅你感兴趣的特定输出类型的文章。For more information on partition keys and the number of output writers, see the article for the specific output type you're interested in. 在上一节中链接了所有输出文章。All output articles are linked in the previous section.

另外,若要对分区进行更高级的优化,可以在查询中使用 INTO <partition count>(请参阅 INTO)子句来控制输出写入器的数量,这可能有助于实现所需的作业拓扑。Additionally, for more advanced tuning of the partitions, the number of output writers can be controlled using an INTO <partition count> (see INTO) clause in your query, which can be helpful in achieving a desired job topology. 如果输出适配器未分区,则一个输入分区中缺少数据将导致延迟最多可达延迟到达的时间量。If your output adapter is not partitioned, lack of data in one input partition causes a delay up to the late arrival amount of time. 在这种情况下,输出将合并到单个写入器,这可能会导致管道中出现瓶颈。In such cases, the output is merged to a single writer, which might cause bottlenecks in your pipeline. 若要了解有关延迟到达策略的详细信息,请参阅 Azure 流分析事件顺序注意事项To learn more about late arrival policy, see Azure Stream Analytics event order considerations.

输出批大小Output batch size

所有输出都支持批处理,但仅部分输出显式支持批处理大小。All outputs support batching, but only some support batch size explicitly. Azure 流分析使用大小可变的批来处理事件和写入到输出。Azure Stream Analytics uses variable-size batches to process events and write to outputs. 通常流分析引擎不会一次写入一条消息,而是使用批来提高效率。Typically the Stream Analytics engine doesn't write one message at a time, and uses batches for efficiency. 当传入和传出事件的速率较高时,流分析将使用更大的批。When the rate of both the incoming and outgoing events is high, Stream Analytics uses larger batches. 输出速率低时,使用较小的批来保证低延迟。When the egress rate is low, it uses smaller batches to keep latency low.

Parquet 输出批处理窗口属性Parquet output batching window properties

使用 Azure 资源管理器模板部署或 REST API 时,两个批处理窗口属性为:When using Azure Resource Manager template deployment or the REST API, the two batching window properties are:

  1. timeWindowtimeWindow

    每批的最长等待时间。The maximum wait time per batch. 该值应为时间跨度的字符串。The value should be a string of Timespan. 例如,“00:02:00”表示两分钟。For example, "00:02:00" for two minutes. 在此时间后,即使不满足最小行数要求,也会将该批写入输出。After this time, the batch is written to the output even if the minimum rows requirement is not met. 默认值为 1 分钟,允许的最大值为 2 小时。The default value is 1 minute and the allowed maximum is 2 hours. 如果 blob 输出具有路径模式频率,则等待时间不能超出分区时间范围。If your blob output has path pattern frequency, the wait time cannot be higher than the partition time range.

  2. sizeWindowsizeWindow

    每批的最小行数。The number of minimum rows per batch. 对于 Parquet,每个批处理都将创建一个新文件。For Parquet, every batch creates a new file. 当前默认值为 2000 行,允许的最大值为 10000 行。The current default value is 2,000 rows and the allowed maximum is 10,000 rows.

这些批处理窗口属性仅受 API 版本“2017-04-01-preview”支持。These batching window properties are only supported by API version 2017-04-01-preview. 下面是 REST API 调用的 JSON 有效负载的示例:Below is an example of the JSON payload for a REST API call:

"type": "stream",
      "serialization": {
        "type": "Parquet",
        "properties": {}
      },
      "timeWindow": "00:02:00",
      "sizeWindow": "2000",
      "datasource": {
        "type": "Microsoft.Storage/Blob",
        "properties": {
          "storageAccounts" : [
          {
            "accountName": "{accountName}",
            "accountKey": "{accountKey}",
          }
          ],

后续步骤Next steps