在 Azure HDInsight 中部署和管理 Apache Storm 拓扑Deploy and manage Apache Storm topologies on Azure HDInsight

本文档介绍有关如何在 HDInsight 群集上管理和监视 Storm 上运行的 Apache Storm 拓扑的基本知识。In this document, learn the basics of managing and monitoring Apache Storm topologies running on Storm on HDInsight clusters.

先决条件Prerequisites

提交拓扑:Visual StudioSubmit a topology: Visual Studio

HDInsight Tools 可用于将 C# 或混合拓扑提交到 Storm 群集。The HDInsight Tools can be used to submit C# or hybrid topologies to your Storm cluster. 以下步骤使用了一个示例应用程序。The following steps use a sample application. 有关使用 HDInsight 工具进行创建的详细信息,请参阅使用用于 Visual Studio 的 HDInsight 工具开发 C# 拓扑For information about creating on using the HDInsight Tools, see Develop C# topologies using the HDInsight Tools for Visual Studio.

  1. 如果尚未安装最新版本的针对 Visual Studio 的 Data Lake 工具,请参阅开始使用针对 Visual Studio 的 Data Lake 工具If you have not already installed the latest version of the Data Lake tools for Visual Studio, see Get started using Data Lake Tools for Visual Studio.

    Note

    针对 Visual Studio 的 Data Lake 工具以前称为 Visual Studio 的 HDInsight 工具。The Data Lake Tools for Visual Studio were formerly called the HDInsight Tools for Visual Studio.

    针对 Visual Studio 的 Data Lake 工具包含在 Visual Studio 2017 的 __Azure 工作负荷__中。Data Lake Tools for Visual Studio are included in the Azure Workload for Visual Studio 2017.

  2. 打开 Visual Studio,选择“文件” > “新建” > “项目” 。Open Visual Studio, select File > New > Project.

  3. 在“新建项目” 对话框中,展开“已安装” > “模板” ,并选择“HDInsight” 。In the New Project dialog box, expand Installed > Templates, and then select HDInsight. 从模板列表中,选择“Storm 示例” 。From the list of templates, select Storm Sample. 在对话框底部,键入应用程序的名称。At the bottom of the dialog box, type a name for the application.

    图像

  4. 在“解决方案资源管理器” 中,右键单击项目,并选择“提交到 Storm on HDInsight” 。In Solution Explorer, right-click the project, and select Submit to Storm on HDInsight.

    Note

    如果出现提示,请输入 Azure 订阅的登录凭据。If prompted, enter the login credentials for your Azure subscription. 如果有多个订阅,请登录包含 Storm on HDInsight 群集的订阅。If you have more than one subscription, log in to the one that contains your Storm on HDInsight cluster.

  5. 从“Storm 群集” 下拉列表中选择 Storm on HDInsight 群集,并选择“提交” 。Select your Storm on HDInsight cluster from the Storm Cluster drop-down list, and then select Submit. 可以使用“输出” 窗口监视提交是否成功。You can monitor whether the submission is successful by using the Output window.

提交拓扑:SSH 和 Storm 命令Submit a topology: SSH and the Storm command

  1. 使用 SSH 连接到 HDInsight 群集。Use SSH to connect to the HDInsight cluster. USERNAME 替换为 SSH 登录名。Replace USERNAME the name of your SSH login. CLUSTERNAME 替换为 HDInsight 群集名称:Replace CLUSTERNAME with your HDInsight cluster name:

     ssh USERNAME@CLUSTERNAME-ssh.azurehdinsight.cn
    

    有关使用 SSH 连接到 HDInsight 群集的详细信息,请参阅将 SSH 与 HDInsight 配合使用For more information on using SSH to connect to your HDInsight cluster, see Use SSH with HDInsight.

  2. 使用以下命令启动示例拓扑:Use the following command to start an example topology:

     storm jar /usr/hdp/current/storm-client/contrib/storm-starter/storm-starter-topologies-*.jar org.apache.storm.starter.WordCountTopology WordCount
    

    此命令启动群集上的示例 WordCount 拓扑。This command starts the example WordCount topology on the cluster. 此拓扑随机生成句子,并统计句子中每个单词的出现次数。This topology randomly generates sentences, and then counts the occurrence of each word in the sentences.

    Note

    将拓扑提交到群集时,必须先复制包含群集的 jar 文件,然后才能使用 storm 命令。When submitting topology to the cluster, you must first copy the jar file containing the cluster before using the storm command. 要将文件复制到群集,可以使用 scp 命令。To copy the file to the cluster, you can use the scp command. 例如: scp FILENAME.jar USERNAME@CLUSTERNAME-ssh.azurehdinsight.cn:FILENAME.jarFor example, scp FILENAME.jar USERNAME@CLUSTERNAME-ssh.azurehdinsight.cn:FILENAME.jar

    WordCount 示例和其他 Storm 初学者示例已经包含在群集中,其位置为 /usr/hdp/current/storm-client/contrib/storm-starter/The WordCount example, and other storm starter examples, are already included on your cluster at /usr/hdp/current/storm-client/contrib/storm-starter/.

提交拓扑:以编程方式Submit a topology: programmatically

可以使用 Nimbus 服务以编程方式部署拓扑。You can programmatically deploy a topology using the Nimbus service. https://github.com/Azure-Samples/hdinsight-java-deploy-storm-topology 提供了 Java 应用程序示例,演示如何通过 Nimbus 服务部署和启动拓扑。https://github.com/Azure-Samples/hdinsight-java-deploy-storm-topology provides an example Java application that demonstrates how to deploy and start a topology through the Nimbus service.

监视和管理:Visual StudioMonitor and manage: Visual Studio

使用 Visual Studio 提交拓扑后,会出现“Storm 拓扑”视图 。When a topology is submitted using Visual Studio, the Storm Topologies view appears. 从列表中选择拓扑,以查看有关正在运行的拓扑的信息。Select the topology from the list to view information about the running topology.

visual studio 监视器

Note

也可以通过依次展开“Azure” > “HDInsight” ,右键单击 Storm on HDInsight 群集,并选择“查看 Storm 拓扑” ,以从“服务器资源管理器” 查看“Storm 拓扑” 。You can also view Storm Topologies from Server Explorer by expanding Azure > HDInsight, and then right-clicking a Storm on HDInsight cluster, and selecting View Storm Topologies.

选择 Spout 或 Bolt 的形状可查看有关这些组件的信息。Select the shape for the spouts or bolts to view information about these components. 每选择一项都会打开一个新窗口。A new window opens for each item selected.

停用和重新激活Deactivate and reactivate

停用某个拓扑会使它暂停,直到将它终止或重新激活。Deactivating a topology pauses it until it is killed or reactivated. 若要执行这些操作,请使用“拓扑摘要” 顶部的“停用” 和“重新激活” 按钮。To perform these operations, use the Deactivate and Reactivate buttons at the top of the Topology Summary.

重新平衡Rebalance

重新平衡拓扑可以让系统修改拓扑的并行度。Rebalancing a topology allows the system to revise the parallelism of the topology. 例如,如果调整了群集的大小以添加更多节点,则重新平衡允许拓扑查看新节点。For example, if you have resized the cluster to add more notes, rebalancing allows a topology to see the new nodes.

若要重新平衡拓扑,请使用“拓扑摘要” 顶部的“重新平衡” 按钮。To rebalance a topology, use the Rebalance button at the top of the Topology Summary.

Warning

重新平衡某个拓扑首先会停用该拓扑,然后跨群集平均重新分布辅助角色,最后让拓扑返回到发生重新平衡之前的状态。Rebalancing a topology first deactivates the topology, then redistributes workers evenly across the cluster, then finally returns the topology to the state it was in before rebalancing occurred. 因此,如果拓扑原本处于活动,则它会再次变为活动状态。So if the topology was active, it becomes active again. 如果它原本已停用,则保持停用状态。If it was deactivated, it remains deactivated.

终止拓扑Kill a topology

Storm 拓扑会一直运行,直到它被停止,或者群集被删除。Storm topologies continue running until they are stopped or the cluster is deleted. 若要停止拓扑,请使用“拓扑摘要” 顶部的“终止” 按钮。To stop a topology, use the Kill button at the top of the Topology Summary.

监视和管理:SSH 和 Storm 命令Monitor and manage: SSH and the Storm command

通过 storm 实用工具,可以从命令行使用正在运行的拓扑。The storm utility allows you to work with running topologies from the command line. 使用 storm -h 可以获取完整的命令行列表。Use storm -h for a full list of commands.

列出拓扑List topologies

使用以下命令可以列出所有正在运行的拓扑:Use the following command to list all running topologies:

storm list

此命令返回类似于以下文本的信息:This command returns information similar to the following text:

Topology_name        Status     Num_tasks  Num_workers  Uptime_secs
-------------------------------------------------------------------
WordCount            ACTIVE     29         2            263

停用和重新激活Deactivate and reactivate

停用某个拓扑会使它暂停,直到将它终止或重新激活。Deactivating a topology pauses it until it is killed or reactivated. 使用以下命令可停用和重新激活拓扑:Use the following command to deactivate and reactivate:

storm Deactivate TOPOLOGYNAME

storm Activate TOPOLOGYNAME

终止正在运行的拓扑Kill a running topology

Storm 拓扑在启动后,将会不断运行,直到将其停止。Storm topologies, once started, continue running until stopped. 若要停止拓扑,请使用以下命令:To stop a topology, use the following command:

storm kill TOPOLOGYNAME

重新平衡Rebalance

重新平衡拓扑可以让系统修改拓扑的并行度。Rebalancing a topology allows the system to revise the parallelism of the topology. 例如,如果调整了群集的大小以添加更多节点,则重新平衡允许拓扑查看新节点。For example, if you have resized the cluster to add more notes, rebalancing allows a topology to see the new nodes.

Warning

重新平衡某个拓扑首先会停用该拓扑,然后跨群集平均重新分布辅助角色,最后让拓扑返回到发生重新平衡之前的状态。Rebalancing a topology first deactivates the topology, then redistributes workers evenly across the cluster, then finally returns the topology to the state it was in before rebalancing occurred. 因此,如果拓扑原本处于活动,则它会再次变为活动状态。So if the topology was active, it becomes active again. 如果它原本已停用,则保持停用状态。If it was deactivated, it remains deactivated.

storm rebalance TOPOLOGYNAME

监视和管理:Storm UIMonitor and manage: Storm UI

Storm UI 提供一个 Web 界面用于处理正在运行的拓扑,HDInsight 群集随附了此界面。The Storm UI provides a web interface for working with running topologies, and is included on your HDInsight cluster. 若要查看 Storm UI,请使用 Web 浏览器打开 https://CLUSTERNAME.azurehdinsight.cn/stormui ,其中 CLUSTERNAME 是群集的名称。To view the Storm UI, use a web browser to open https://CLUSTERNAME.azurehdinsight.cn/stormui, where CLUSTERNAME is the name of your cluster.

Note

如果系统要求提供用户名和密码,请输入创建群集时使用的群集管理员用户名 (admin) 和密码。If asked to provide a user name and password, enter the cluster administrator (admin) and password that you used when creating the cluster.

主页面Main page

Storm UI 的主页面提供以下信息:The main page of the Storm UI provides the following information:

  • 群集摘要:有关 Storm 群集的基本信息。Cluster summary: Basic information about the Storm cluster.
  • 拓扑摘要:正在运行的拓扑的列表。Topology summary: A list of running topologies. 使用此部分中的链接可以查看有关特定拓扑的详细信息。Use the links in this section to view more information about specific topologies.
  • 监督器摘要:有关 Storm 监督器的信息。Supervisor summary: Information about the Storm supervisor.
  • Nimbus 配置:群集的 Nimbus 配置。Nimbus configuration: Nimbus configuration for the cluster.

拓扑摘要Topology summary

选择“拓扑摘要” 部分中的链接会显示有关拓扑的以下信息:Selecting a link from the Topology summary section displays the following information about the topology:

  • 拓扑摘要:有关拓扑的基本信息。Topology summary: Basic information about the topology.

  • 拓扑操作:可对拓扑执行的管理操作。Topology actions: Management actions that you can perform for the topology.

    • 激活:继续处理已停用的拓扑。Activate: Resumes processing of a deactivated topology.

    • 停用:暂停正在运行的拓扑。Deactivate: Pauses a running topology.

    • 重新平衡:调整拓扑的并行度。Rebalance: Adjusts the parallelism of the topology. 更改群集中的节点数目之后,应该重新平衡正在运行的拓扑。You should rebalance running topologies after you have changed the number of nodes in the cluster. 此操作可让拓扑调整并行度,以弥补群集中增加或减少的节点数。This operation allows the topology to adjust parallelism to compensate for the increased or decreased number of nodes in the cluster.

      有关详细信息,请参阅了解 Apache Storm 拓扑的并行度For more information, see Understanding the parallelism of an Apache Storm topology.

    • 终止:在经过指定的超时之后终止 Storm 拓扑。Kill: Terminates a Storm topology after the specified timeout.

  • 拓扑统计信息:有关拓扑的统计信息。Topology stats: Statistics about the topology. 若要设置页面上剩余项的时间范围,请使用“窗口” 列中的链接。To set the timeframe for the remaining entries on the page, use the links in the Window column.

  • Spout:拓扑使用的 Spout。Spouts: The spouts used by the topology. 使用此部分中的链接可以查看有关特定 Spout 的详细信息。Use the links in this section to view more information about specific spouts.

  • Bolt:拓扑使用的 Bolt。Bolts: The bolts used by the topology. 使用此部分中的链接可以查看有关特定 Bolt 的详细信息。Use the links in this section to view more information about specific bolts.

  • 拓扑配置:所选拓扑的配置。Topology configuration: The configuration of the selected topology.

Spout 和 Bolt 摘要Spout and Bolt summary

从“Spout” 或“Bolt” 部分中选择 spout 会显示有关选定项的以下信息:Selecting a spout from the Spouts or Bolts sections displays the following information about the selected item:

  • 组件摘要:有关 Spout 或 Bolt 的基本信息。Component summary: Basic information about the spout or bolt.
  • Spout/Bolt 统计信息:有关 Spout 或 Bolt 的统计信息。Spout/Bolt stats: Statistics about the spout or bolt. 若要设置页面上剩余项的时间范围,请使用“窗口” 列中的链接。To set the timeframe for the remaining entries on the page, use the links in the Window column.
  • 输入统计信息(仅限 Bolt):有关 Bolt 使用的输入流的信息。Input stats (bolt only): Information about the input streams consumed by the bolt.
  • 输出统计信息:有关此 Spout 或 Bolt 所发出的流的信息。Output stats: Information about the streams emitted by the spout or bolt.
  • 执行器:有关 Spout 或 Bolt 实例的信息。Executors: Information about the instances of the spout or bolt. 选择特定执行器的“端口” 项可以查看针对此实例生成的诊断信息的日志。Select the Port entry for a specific executor to view a log of diagnostic information produced for this instance.
  • 错误:此 Spout 或 Bolt 的任何错误信息。Errors: Any error information for the spout or bolt.

监视和管理:REST APIMonitor and manage: REST API

Storm UI 是以 REST API 为基础生成的,因此,可以使用 API 执行类似的管理和监视功能。The Storm UI is built on top of the REST API, so you can perform similar management and monitoring functionality by using the REST API. 使用 REST API 可以创建自定义工具来管理和监视 Storm 拓扑。You can use the REST API to create custom tools for managing and monitoring Storm topologies.

有关详细信息,请参阅 Apache Storm UI REST APIFor more information, see Apache Storm UI REST API. 以下信息特定于将 REST API 与 Apache Storm on HDInsight 配合使用的情况。The following information is specific to using the REST API with Apache Storm on HDInsight.

Important

Storm REST API 不能通过 Internet 公开使用,而必须使用与 HDInsight 群集头节点建立的 SSH 隧道来访问。The Storm REST API is not publicly available over the internet, and must be accessed using an SSH tunnel to the HDInsight cluster head node. 若要了解如何创建和使用 SSH 隧道,请参阅使用 SSH 隧道访问 Apache Ambari Web UI、ResourceManager、JobHistory、NameNode、Apache Oozie 和其他 Web UIFor information on creating and using an SSH tunnel, see Use SSH Tunneling to access Apache Ambari web UI, ResourceManager, JobHistory, NameNode, Apache Oozie, and other web UIs.

基本 URIBase URI

可在 https://HEADNODEFQDN:8744/api/v1/ 的头节点上获取基于 Linux 的 HDInsight 群集上的 REST API 的基 URI。The base URI for the REST API on Linux-based HDInsight clusters is available on the head node at https://HEADNODEFQDN:8744/api/v1/. 头节点的域名在群集创建过程中生成,且非静态。The domain name of the head node is generated during cluster creation and is not static.

可以使用多种不同的方式查找群集头节点的完全限定域名 (FQDN):You can find the fully qualified domain name (FQDN) for the cluster head node in several different ways:

  • 从 SSH 会话:通过与群集建立的 SSH 会话使用命令 headnode -fFrom an SSH session: Use the command headnode -f from an SSH session to the cluster.
  • 从 Ambari Web:从页面顶部选择“服务” ,并选择“Storm” 。From Ambari Web: Select Services from the top of the page, then select Storm. 在“摘要” 选项卡中,选择“Storm UI 服务器” 。From the Summary tab, select Storm UI Server. 页面顶部会显示承载 Storm UI 和 REST API 的节点的 FQDN。The FQDN of the node that hosts the Storm UI and REST API is displayed at the top of the page.
  • 从 Ambari REST API:使用命令 curl -u admin -G "https://CLUSTERNAME.azurehdinsight.cn/api/v1/clusters/CLUSTERNAME/services/STORM/components/STORM_UI_SERVER" 来检索有关 Storm UI 和 REST API 正在其上运行的节点的信息。From Ambari REST API: Use the command curl -u admin -G "https://CLUSTERNAME.azurehdinsight.cn/api/v1/clusters/CLUSTERNAME/services/STORM/components/STORM_UI_SERVER" to retrieve information about the node that the Storm UI and REST API are running on. CLUSTERNAME 替换为群集名称。Replace CLUSTERNAME with the cluster name. 出现提示时,请输入登录(管理员)帐户的密码。When prompted, enter the password for the login (admin) account. 在响应中,“host_name”条目包含节点的 FQDN。In the response, the "host_name" entry contains the FQDN of the node.

身份验证Authentication

对 REST API 的请求必须使用 基本身份验证,因此应该使用 HDInsight 群集管理员名称和密码。Requests to the REST API must use basic authentication, so you use the HDInsight cluster administrator name and password.

Note

由于基本身份验证是使用明文发送的,因此 始终 应该使用 HTTPS 来保护与群集之间的通信。Because basic authentication is sent by using clear text, you should always use HTTPS to secure communications with the cluster.

返回值Return values

从 REST API 返回的信息只能从该群集中使用。Information that is returned from the REST API may only be usable from within the cluster. 例如,无法从 Internet 访问针对 Apache ZooKeeper 服务器返回的完全限定的域名 (FQDN)。For example, the fully qualified domain name (FQDN) returned for Apache ZooKeeper servers is not accessible from the Internet.

后续步骤Next Steps

了解如何使用 Apache Maven 开发基于 Java 的拓扑Learn how to Develop Java-based topologies using Apache Maven.

有关更多示例拓扑的列表,请参阅 Apache Storm on HDInsight 的示例拓扑For a list of more example topologies, see Example topologies for Apache Storm on HDInsight.