快速入门:在 Azure HDInsight 中创建和监视 Apache Storm 拓扑Quickstart: Create and monitor an Apache Storm topology in Azure HDInsight

Apache Storm 是一个可扩展的、具有容错能力的分布式实时计算系统,用于处理数据流。Apache Storm is a scalable, fault-tolerant, distributed, real-time computation system for processing streams of data. 使用 Azure HDInsight 上的 Storm,可以创建一个基于云的、用于实时执行大数据分析的 Storm 群集。With Storm on Azure HDInsight, you can create a cloud-based Storm cluster that performs big data analytics in real time.

本快速入门将使用 Apache storm-starter 项目中的示例来创建和监视现有 Apache Storm 群集的 Apache Storm 拓扑。In this quickstart, you use an example from the Apache storm-starter project to create and monitor an Apache Storm topology to an existing Apache Storm cluster.

先决条件Prerequisites

创建拓扑Create the topology

  1. 连接到 Storm 群集。Connect to your Storm cluster. 编辑以下命令,将 CLUSTERNAME 替换为 Storm 群集的名称,然后输入该命令:Edit the command below by replacing CLUSTERNAME with the name of your Storm cluster, and then enter the command:

    ssh sshuser@CLUSTERNAME-ssh.azurehdinsight.cn
    
  2. WordCount 示例包含在位于 /usr/hdp/current/storm-client/contrib/storm-starter/ 的 HDInsight 群集中。The WordCount example is included on your HDInsight cluster at /usr/hdp/current/storm-client/contrib/storm-starter/. 此拓扑生成随机句子,并计算单词的出现次数。The topology generates random sentences and counts how many times words occur. 使用以下命令在群集上启动 wordcount 拓扑:Use the following command to start the wordcount topology on the cluster:

    storm jar /usr/hdp/current/storm-client/contrib/storm-starter/storm-starter-topologies-*.jar org.apache.storm.starter.WordCountTopology wordcount
    

监视拓扑Monitor the topology

Storm 提供一个 Web 界面用于处理正在运行的拓扑,并包含在 HDInsight 群集中。Storm provides a web interface for working with running topologies, and is included on your HDInsight cluster.

执行以下步骤以使用 Storm UI 来监视拓扑。Use the following steps to monitor the topology using the Storm UI:

  1. 若要显示 Storm UI,请打开 Web 浏览器,访问 https://CLUSTERNAME.azurehdinsight.cn/stormuiTo display the Storm UI, open a web browser to https://CLUSTERNAME.azurehdinsight.cn/stormui. CLUSTERNAME 替换为群集的名称。Replace CLUSTERNAME with the name of your cluster.

  2. 在“拓扑摘要” 下,选择“名称” 列中的“wordcount” 条目。Under Topology Summary, select the wordcount entry in the Name column. 显示有关拓扑的信息。Information about the topology is displayed.

    包含 storm-starter WordCount 拓扑信息的 Storm 仪表板。

    新页提供以下信息:The new page provides the following information:

    属性Property 说明Description
    拓扑统计信息Topology stats 有关拓扑性能的基本信息,已组织到时间窗口中。Basic information on the topology performance, organized into time windows. 选择特定的时间窗口会更改页面其他部分中显示的信息的时间窗口。Selecting a specific time window changes the time window for information displayed in other sections of the page.
    SpoutSpouts 有关 spout 的基本信息,包括每个 spout 返回的最后一个错误。Basic information about spouts, including the last error returned by each spout.
    BoltBolts 有关 bolt 的基本信息。Basic information about bolts.
    拓扑配置Topology configuration 有关拓扑配置的详细信息。Detailed information about the topology configuration.
    激活Activate 继续处理已停用的拓扑。Resumes processing of a deactivated topology.
    停用Deactivate 暂停正在运行的拓扑。Pauses a running topology.
    重新平衡Rebalance 调整拓扑的并行度。Adjusts the parallelism of the topology. 更改群集中的节点数目之后,应该重新平衡正在运行的拓扑。You should rebalance running topologies after you have changed the number of nodes in the cluster. 重新平衡可调整并行度,以弥补群集中增加/减少的节点数目。Rebalancing adjusts parallelism to compensate for the increased/decreased number of nodes in the cluster. 有关详细信息,请参阅了解 Apache Storm 拓扑的并行度For more information, see Understanding the parallelism of an Apache Storm topology.
    终止Kill 在经过指定的超时之后终止 Storm 拓扑。Terminates a Storm topology after the specified timeout.
  3. 在此页中,从“Spout” 或“Bolt” 部分中选择一个条目。From this page, select an entry from the Spouts or Bolts section. 将显示有关选定组件的信息。Information about the selected component is displayed.

    包含有关选定组件的信息的 Storm 仪表板。

    新页显示以下信息:The new page displays the following information:

    属性Property 说明Description
    Spout/Bolt 统计信息Spout/Bolt stats 有关组件性能的基本信息,已组织到时间窗口中。Basic information on the component performance, organized into time windows. 选择特定的时间窗口会更改页面其他部分中显示的信息的时间窗口。Selecting a specific time window changes the time window for information displayed in other sections of the page.
    输入统计信息(仅限 Bolt)Input stats (bolt only) 有关生成 Bolt 所用数据的组件的信息。Information on components that produce data consumed by the bolt.
    输出统计信息Output stats 有关此 Bolt 发出的数据的信息。Information on data emitted by this bolt.
    执行程序Executors 有关此组件的实例的信息。Information on instances of this component.
    错误Errors 此组件生成的错误。Errors produced by this component.
  4. 在查看 spout 或 bolt 的详细信息时,从“执行器” 部分中的“端口” 列中选择一个条目可以查看组件特定实例的详细信息。When viewing the details of a spout or bolt, select an entry from the Port column in the Executors section to view details for a specific instance of the component.

     2015-01-27 14:18:02 b.s.d.task [INFO] Emitting: split default ["with"]
     2015-01-27 14:18:02 b.s.d.task [INFO] Emitting: split default ["nature"]
     2015-01-27 14:18:02 b.s.d.executor [INFO] Processing received message source: split:21, stream: default, id: {}, [snow]
     2015-01-27 14:18:02 b.s.d.task [INFO] Emitting: count default [snow, 747293]
     2015-01-27 14:18:02 b.s.d.executor [INFO] Processing received message source: split:21, stream: default, id: {}, [white]
     2015-01-27 14:18:02 b.s.d.task [INFO] Emitting: count default [white, 747293]
     2015-01-27 14:18:02 b.s.d.executor [INFO] Processing received message source: split:21, stream: default, id: {}, [seven]
     2015-01-27 14:18:02 b.s.d.task [INFO] Emitting: count default [seven, 1493957]
    

    在此示例中,seven 一词出现了 1493957 次。In this example, the word seven has occurred 1493957 times. 此计数是自启动此拓扑以来已遇到该单词的次数。This count is how many times the word has been encountered since this topology was started.

停止拓扑Stop the topology

返回到单词计数拓扑的“拓扑摘要”页,并从“拓扑操作”部分中选择“终止”按钮。 Return to the Topology summary page for the word-count topology, and then select the Kill button from the Topology actions section. 出现提示时,输入停止拓扑之前要等待的秒数,即 10。When prompted, enter 10 for the seconds to wait before stopping the topology. 在超时期限之后访问仪表板的“Storm UI” 部分,不会再显示该拓扑。After the timeout period, the topology no longer appears when you visit the Storm UI section of the dashboard.

清理资源Clean up resources

完成本快速入门后,可以删除群集。After you complete the quickstart, you may want to delete the cluster. 有了 HDInsight,便可以将数据存储在 Azure 存储中,因此可以在群集不用时安全地删除群集。With HDInsight, your data is stored in Azure Storage, so you can safely delete a cluster when it is not in use. 此外,还需要支付 HDInsight 群集费用,即使未使用。You are also charged for an HDInsight cluster, even when it is not in use. 由于群集费用高于存储空间费用数倍,因此在不使用群集时将其删除可以节省费用。Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they are not in use.

若要删除群集,请参阅使用浏览器、PowerShell 或 Azure CLI 删除 HDInsight 群集To delete a cluster, see Delete an HDInsight cluster using your browser, PowerShell, or the Azure CLI.

后续步骤Next steps

本快速入门使用了 Apache storm-starter 项目中的示例来创建和监视现有 Apache Storm 群集的 Apache Storm 拓扑。In this quickstart, you used an example from the Apache storm-starter project to create and monitor an Apache Storm topology to an existing Apache Storm cluster. 转到下一篇文章,了解管理和监视 Apache Storm 拓扑的基础知识。Advance to the next article to learn the basics of managing and monitoring Apache Storm topologies.