在 Azure HDInsight 中部署和管理 Apache Storm 拓扑Deploy and manage Apache Storm topologies on Azure HDInsight

本文档介绍有关如何在 HDInsight 群集上管理和监视 Storm 上运行的 Apache Storm 拓扑的基本知识。In this document, learn the basics of managing and monitoring Apache Storm topologies running on Storm on HDInsight clusters.

必备条件Prerequisites

使用 Visual Studio 提交拓扑Submit a topology using Visual Studio

可以使用适用于 Visual Studio 的 Data Lake 工具将 C# 或混合拓扑提交到 Storm 群集。You can use the Data Lake Tools for Visual Studio to submit C# or hybrid topologies to your Storm cluster. 以下步骤使用了一个示例应用程序。The following steps use a sample application. 有关使用 Data Lake 工具创建拓扑的信息,请参阅使用 Visual Studio 和 C# 创建 Apache Storm 拓扑For information about topology creation using the Data Lake Tools, see Apache Storm topologies with Visual Studio and C#.

  1. 如果尚未安装最新版本的适用于 Visual Studio 的 Data Lake 工具,请参阅使用适用于 Visual Studio 的 Data Lake 工具If you haven't already installed the latest version of the Data Lake tools for Visual Studio, see Use Data Lake Tools for Visual Studio.

    备注

    Azure Data Lake 和流分析工具以前称为 HDInsight Tools for Visual Studio。The Azure Data Lake and Stream Analytics Tools were formerly called the HDInsight Tools for Visual Studio.

    Azure Data Lake 和适用于 Visual Studio 的流分析工具包含在 Visual Studio 2019 的 Azure 开发工作负荷中。Azure Data Lake and Stream Analytics Tools for Visual Studio are included in the Azure development workload for Visual Studio 2019.

  2. 启动 Visual Studio。Start Visual Studio.

  3. 在“开始”窗口中,选择“创建新项目”。 In the Start window, select Create a new project.

  4. 在“创建新项目”窗口中,选择搜索框并输入 StormIn the Create a new project window, select the search box, and enter Storm. 从结果列表中选择“Storm 示例”,然后选择“下一步”。 Then choose Storm Sample from the result list and select Next.

  5. 在“配置新项目”窗口输入一个项目名称,然后转到或创建一个位置用于保存新项目。 In the Configure your new project window, enter a Project name, and go to or create a Location to save the new project in. 然后选择“创建” 。Then select Create.

    “配置新项目”窗口,Visual Studio

  6. 在服务器资源管理器中,右键单击“Azure”并选择“连接到 Microsoft Azure 订阅...”,然后完成登录过程。 From Server Explorer, right-click Azure and select Connect to Microsoft Azure Subscription... and complete the sign-in process.

  7. 在“解决方案资源管理器”中右键单击项目,然后选择“提交到 Storm on HDInsight”。 From Solution Explorer, right-click the project, and choose Submit to Storm on HDInsight.

    备注

    如果出现提示,请输入 Azure 订阅的登录凭据。If prompted, enter the login credentials for your Azure subscription. 如果有多个订阅,请登录到包含 Storm on HDInsight 群集的订阅。If you have more than one subscription, sign in to the one that contains your Storm on HDInsight cluster.

  8. 从“提交拓扑”对话框中的“Storm 群集”下拉列表内,选择你的 Storm on HDInsight 群集,然后选择“提交”。 In the Submit Topology dialog box, under the Storm Cluster drop-down list, choose your Storm on HDInsight cluster, and then select Submit. 可以查看“输出”窗格来监视提交是否成功。 You can monitor whether the submission is successful by viewing the Output pane.

使用 SSH 和 Storm 命令提交拓扑Submit a topology using SSH and the Storm command

  1. 使用 ssh 命令连接到群集。Use ssh command to connect to your cluster. 编辑以下命令(将 CLUSTERNAME 替换为群集的名称),然后输入该命令:Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command:

    ssh sshuser@CLUSTERNAME-ssh.azurehdinsight.cn
    
  2. 在 SSH 会话中,使用以下命令启动 WordCount 示例拓扑:From your ssh session, use the following command to start the WordCount example topology:

    storm jar /usr/hdp/current/storm-client/contrib/storm-starter/storm-starter-topologies-*.jar org.apache.storm.starter.WordCountTopology WordCount
    

    此命令启动群集上的示例 WordCount 拓扑。This command starts the example WordCount topology on the cluster. 此拓扑随机生成句子,并统计句子中每个单词的出现次数。This topology randomly generates sentences, and then counts the occurrence of each word in the sentences.

    备注

    将拓扑提交到群集时,必须先复制包含群集的 .jar 文件,然后才能使用 storm 命令。When submitting topology to the cluster, you must first copy the .jar file containing the cluster before using the storm command. 要将文件复制到群集,可以使用 scp 命令。To copy the file to the cluster, you can use the scp command. 例如,输入 scp FILENAME.jar USERNAME@CLUSTERNAME-ssh.azurehdinsight.cn:FILENAME.jarFor example, enter scp FILENAME.jar USERNAME@CLUSTERNAME-ssh.azurehdinsight.cn:FILENAME.jar.

    WordCount 示例和其他 Storm 初学者示例已包含在群集中,其位置为 /usr/hdp/current/storm-client/contrib/storm-starter/The WordCount example, and other storm starter examples, are already included on your cluster at /usr/hdp/current/storm-client/contrib/storm-starter/.

以编程方式提交拓扑Submit a topology programmatically

可以使用 Nimbus 服务以编程方式部署拓扑。You can programmatically deploy a topology using the Nimbus service. https://github.com/Azure-Samples/hdinsight-java-deploy-storm-topology 提供了 Java 应用程序示例,演示如何通过 Nimbus 服务部署和启动拓扑。https://github.com/Azure-Samples/hdinsight-java-deploy-storm-topology provides an example Java application that demonstrates how to deploy and start a topology through the Nimbus service.

在 Visual Studio 中监视和管理拓扑Monitor and manage a topology in Visual Studio

使用 Visual Studio 提交拓扑时,将显示“Storm 拓扑视图”窗口。 When you submit a topology using Visual Studio, the Storm Topologies View window appears. 从列表中选择拓扑,以查看有关正在运行的拓扑的信息。Select the topology from the list to view information about the running topology.

监视拓扑,“Storm 拓扑视图”窗口,Visual Studio

备注

也可以从“服务器资源管理器”查看“Storm 拓扑”。You can also view Storm Topologies from Server Explorer. 展开“Azure” > “HDInsight”,右键单击 Storm on HDInsight 群集,然后选择“查看 Storm 拓扑”。Expand Azure > HDInsight, right-click a Storm on HDInsight cluster, and then select View Storm Topologies.

选择 Spout 或 Bolt 的形状可查看有关这些组件的信息。Select the shape for the spouts or bolts to view information about these components. 此时会显示所选项的工具提示,其中包含组件信息。A tooltip with component information appears for the item selected.

停用和重新激活拓扑Deactivate and reactivate a topology

停用某个拓扑会使它暂停,直到将它终止或重新激活。Deactivating a topology pauses it until the topology is killed or reactivated. 若要执行这些操作,请使用“Storm 拓扑视图”窗口顶部的“操作”区域中的“停用”和“重新激活”按钮。 To do these operations, use the Deactivate and Reactivate buttons in the Actions area at the top of the Storm Topologies View window.

重新平衡拓扑Rebalance a topology

重新平衡拓扑可以让系统修改拓扑的并行度。Rebalancing a topology allows the system to revise the parallelism of the topology. 例如,如果调整了群集的大小以添加更多节点,则重新平衡允许拓扑查看新节点。For example, if you've resized the cluster to add more notes, rebalancing allows a topology to see the new nodes.

若要重新平衡拓扑,请使用“Storm 拓扑视图”窗口的“操作”区域中的“重新平衡”按钮。 To rebalance a topology, use the Rebalance button in the Actions area of the Storm Topologies View window.

警告

重新平衡某个拓扑会停用该拓扑,跨群集平均重新分布辅助角色,然后让拓扑返回到发生重新平衡之前的状态。Rebalancing a topology deactivates the topology, redistributes workers evenly across the cluster, and then returns the topology to the state it was in before rebalancing occurred. 如果该拓扑原本处于活动状态,则它会再次变为活动状态。If the topology was active, it becomes active again. 如果该拓扑原本已停用,则会保持停用状态。If the topology was deactivated, it remains deactivated.

终止正在运行的拓扑Kill a running topology

Storm 拓扑会一直运行,直到它被停止,或者群集被删除。Storm topologies continue running until they're stopped or the cluster is deleted. 若要停止拓扑,请使用“操作”区域中的“终止”按钮。 To stop a topology, use the Kill button in the Actions area.

使用 SSH 与 Storm 命令监视和管理拓扑Monitor and manage a topology using SSH and the Storm command

通过 storm 实用工具,可以从命令行使用正在运行的拓扑。The storm utility allows you to work with running topologies from the command line. 使用 storm -h 可以获取完整的命令行列表。Use storm -h for a full list of commands.

列出拓扑List topologies

使用以下命令可以列出所有正在运行的拓扑:Use the following command to list all running topologies:

storm list

此命令返回类似于以下文本的信息:This command returns information similar to the following text:

Topology_name        Status     Num_tasks  Num_workers  Uptime_secs
-------------------------------------------------------------------
WordCount            ACTIVE     29         2            263

停用和重新激活拓扑Deactivate and reactivate a topology

停用某个拓扑会使它暂停,直到将它终止或重新激活。Deactivating a topology pauses it until the topology is killed or reactivated. 使用以下命令可停用或重新激活拓扑:Use the following commands to deactivate or reactivate:

storm Deactivate TOPOLOGYNAME
storm Activate TOPOLOGYNAME

终止正在运行的拓扑Kill a running topology

Storm 拓扑在启动后,会不断运行,直到将其停止。Storm topologies, once started, continue running until stopped. 若要停止拓扑,请使用以下命令:To stop a topology, use the following command:

storm kill TOPOLOGYNAME

重新平衡拓扑Rebalance a topology

重新平衡拓扑可以让系统修改拓扑的并行度。Rebalancing a topology allows the system to revise the parallelism of the topology. 例如,如果调整了群集的大小以添加更多节点,则重新平衡允许拓扑查看新节点。For example, if you've resized the cluster to add more notes, rebalancing allows a topology to see the new nodes.

警告

重新平衡某个拓扑会停用该拓扑,跨群集平均重新分布辅助角色,然后让拓扑返回到发生重新平衡之前的状态。Rebalancing a topology deactivates the topology, redistributes workers evenly across the cluster, and then returns the topology to the state it was in before rebalancing occurred. 如果该拓扑原本处于活动状态,则它会再次变为活动状态。If the topology was active, it becomes active again. 如果它原本已停用,则将保持停用状态。If it was deactivated, it remains deactivated.

storm rebalance TOPOLOGYNAME

使用 Storm UI 监视和管理拓扑Monitor and manage a topology using the Storm UI

Storm UI 提供一个 Web 界面用于处理正在运行的拓扑,HDInsight 群集随附了此界面。The Storm UI provides a web interface for working with running topologies, and it's included on your HDInsight cluster. 若要查看 Storm UI,请使用 Web 浏览器打开 https://CLUSTERNAME.azurehdinsight.cn/stormui,其中 CLUSTERNAME 是群集的名称。To view the Storm UI, use a web browser to open https://CLUSTERNAME.azurehdinsight.cn/stormui, where CLUSTERNAME is the name of your cluster.

备注

如果系统要求提供用户名和密码,请输入创建群集时使用的群集管理员用户名和密码。If you're asked to provide a user name and password, enter the cluster administrator username and password that you used when creating the cluster.

Storm UI 主页Storm UI main page

Storm UI 的主页面提供以下信息:The main page of the Storm UI provides the following information:

部分Section 说明Description
群集摘要Cluster summary 有关 Storm 群集的基本信息。Basic information about the Storm cluster.
Nimbus 摘要Nimbus summary 基本 Nimbus 信息的列表。A list of basic Nimbus information.
拓扑摘要Topology summary 正在运行的拓扑的列表。A list of running topologies. 若要查看有关特定拓扑的详细信息,请在“名称”列中选择其对应的链接。 To view more information about a specific topology, select its link in the Name column.
监督器摘要Supervisor summary 有关 Storm 监督器的信息。Information about the Storm supervisor. 若要查看与特定监督器关联的辅助角色资源,请在“主机”或“ID”列中选择其对应的链接。 To see the worker resources associated with a specific supervisor, select its link in the Host or Id column.
Nimbus 配置Nimbus configuration 群集的 Nimbus 配置。Nimbus configuration for the cluster.

Storm UI 主页类似于以下网页:The Storm UI main page looks similar to this web page:

主页,Storm UI,Apache Storm 拓扑,Azure 见解

拓扑摘要Topology summary

选择“拓扑摘要” 部分中的链接会显示有关拓扑的以下信息:Selecting a link from the Topology summary section displays the following information about the topology:

部分Section 说明Description
拓扑摘要Topology summary 有关拓扑的基本信息。Basic information about the topology.
拓扑操作Topology actions 可对拓扑执行的管理操作。Management actions that you can do for the topology. 本部分稍后将介绍可用的操作。The available actions are described later in this section.
拓扑统计信息Topology stats 有关拓扑的统计信息。Statistics about the topology. 若要为此部分中的某个条目设置期限,请在“窗口”列中选择其对应的链接。 To set the time frame for an entry in this section, select its link in the Window column.
Spout(期限) Spouts (time frame) 拓扑使用的 Spout。The spouts used by the topology. 若要查看有关特定 Spout 的详细信息,请在“ID”列中选择其对应的链接。 To view more information about a specific spout, select its link in the Id column.
Bolt(期限) Bolts (time frame) 拓扑使用的 Bolt。The bolts used by the topology. 若要查看有关特定 Bolt 的详细信息,请在“ID”列中选择其对应的链接。 To view more information about a specific bolt, select its link in the Id column.
辅助角色资源Worker resources 辅助角色资源的列表。A list of worker resources. 若要查看有关特定辅助角色资源的详细信息,请在“主机”列中选择其对应的链接。 To view more information about a specific worker resource, select its link in the Host column.
拓扑可视化效果Topology visualization “拓扑可视化效果”按钮,用于显示拓扑的可视化效果。 A Show Visualization button that displays a visualization of the topology.
拓扑配置Topology configuration 所选拓扑的配置。The configuration of the selected topology.

Storm 拓扑摘要页类似于以下网页:The Storm topology summary page looks similar to this web page:

拓扑摘要页,Storm UI,Apache Storm,Azure 见解

在“拓扑操作”部分,可以选择以下按钮来执行相应的操作: In the Topology actions section, you can select the following buttons to do an action:

按钮Button 说明Description
激活Activate 继续处理已停用的拓扑。Resumes processing of a deactivated topology.
停用Deactivate 暂停正在运行的拓扑。Pauses a running topology.
重新平衡Rebalance 调整拓扑的并行度。Adjusts the parallelism of the topology. 更改群集中的节点数目之后,应该重新平衡正在运行的拓扑。You should rebalance running topologies after you've changed the number of nodes in the cluster. 此操作可让拓扑调整并行度,以弥补群集中增加或减少的节点数。This operation allows the topology to adjust parallelism to compensate for the additional or reduced number of nodes in the cluster.

有关详细信息,请参阅了解 Apache Storm 拓扑的并行度For more information, see Understanding the parallelism of an Apache Storm topology.
终止Kill 在经过指定的超时之后终止 Storm 拓扑。Terminates a Storm topology after the specified timeout.
调试Debug 针对正在运行的拓扑启动调试会话。Begins a debugging session for the running topology.
停止调试Stop Debug 结束正在运行的拓扑的调试会话。Ends the debugging session for the running topology.
更改日志级别Change Log Level 修改调试日志级别。Modifies the debugging log level.
Spout 和 Bolt 摘要Spout and bolt summary

从“Spout” 或“Bolt” 部分中选择 spout 会显示有关选定项的以下信息:Selecting a spout from the Spouts or Bolts sections displays the following information about the selected item:

部分Section 说明Description
组件摘要Component summary 有关 Spout 或 Bolt 的基本信息。Basic information about the spout or bolt.
组件操作Component actions “调试”和“停止调试”按钮。 Debug and Stop Debug buttons.
“Spout 统计信息”或“Bolt 统计信息” Spout stats or Bolt stats 有关 Spout 或 Bolt 的统计信息。Statistics about the spout or bolt. 若要为此部分中的某个条目设置期限,请在“窗口”列中选择其对应的链接。 To set the time frame for an entry in this section, select its link in the Window column.
(仅适用于 Bolt)(Bolt-only)
输入统计信息(期限) Input stats (time frame)
有关 Bolt 使用的输入流的信息。Information about the input streams consumed by the bolt.
输出统计信息(期限) Output stats (time frame) 有关 Spout 或 Bolt 所发出的流的信息。Information about the streams emitted by the spout or bolt.
分析和调试Profiling and debugging 用于在此页上分析和调试组件的控件。Controls for profiling and debugging the components on this page. 可以设置“状态/超时(分钟)”值,还可以选择“JStack”、“重启辅助角色”和“堆”对应的按钮。 You can set the Status / Timeout (Minutes) value, and you can select buttons for JStack, Restart Worker, and Heap.
执行器(期限) Executors (time frame) 有关 Spout 或 Bolt 实例的信息。Information about the instances of the spout or bolt. 若要查看针对此实例生成的诊断信息的日志,请选择特定执行器的“端口”项。 To view a log of diagnostic information produced for this instance, select the Port entry for a specific executor. 还可以查看与特定执行器关联的辅助角色资源,只需在“主机”列中选择此执行器对应的链接即可。 You can also see the worker resources associated with a specific executor by selecting its link in the Host column.
错误Errors Spout 或 Bolt 的任何错误信息。Any error information for the spout or bolt.

Storm Bolt 摘要页类似于以下网页:The Storm bolt summary page looks similar to this web page:

Bolt 摘要页,Storm UI,Apache Storm,Azure 见解

使用 REST API 监视和管理拓扑Monitor and manage the topology using the REST API

Storm UI 是以 REST API 为基础生成的,因此,可以使用 API 执行类似的管理和监视任务。The Storm UI is built on top of the REST API, so you can do similar management and monitoring tasks by using the REST API. 使用 REST API 可以创建自定义工具来管理和监视 Storm 拓扑。You can use the REST API to create custom tools for managing and monitoring Storm topologies.

有关详细信息,请参阅 Apache Storm UI REST APIFor more information, see Apache Storm UI REST API. 以下信息特定于将 REST API 与 Apache Storm on HDInsight 配合使用的情况。The following information is specific to using the REST API with Apache Storm on HDInsight.

重要

无法通过 Internet 公开使用 Storm REST API。The Storm REST API is not publicly available over the internet. 必须使用 HDInsight 群集头节点的 SSH 隧道访问该 API。It must be accessed using an SSH tunnel to the HDInsight cluster head node. 有关创建和使用 SSH 隧道的信息,请参阅使用 SSH 隧道访问 Azure HDInsightFor information on creating and using an SSH tunnel, see Use SSH tunneling to access Azure HDInsight.

基本 URIBase URI

基于 Linux 的 HDInsight 群集上 REST API 的基 URI 在 URL 地址 https://HEADNODEFQDN:8744/api/v1/ 中提供(请将 HEADNODEFQDN 替换为头节点)。The base URI for the REST API on Linux-based HDInsight clusters is available at URL address https://HEADNODEFQDN:8744/api/v1/, where you replace HEADNODEFQDN with the head node. 头节点的域名是创建群集期间生成的,而不是静态的。The domain name of the head node is generated during cluster creation and isn't static.

可以使用多种方式查找群集头节点的完全限定域名 (FQDN):You can find the fully qualified domain name (FQDN) for the cluster head node in several ways:

FQDN 发现方法FQDN discovery method 说明Description
SSH 会话SSH session 通过与群集建立的 SSH 会话使用 headnode -f 命令。Use the command headnode -f from an SSH session to the cluster.
Ambari WebAmbari Web 在 Ambari 群集网页 (https://CLUSTERNAME.azurehdinsight.cn) 的顶部选择“服务”,然后选择“Storm”。 On the Ambari cluster web page (https://CLUSTERNAME.azurehdinsight.cn), select Services from the top of the page, then select Storm. 在“摘要” 选项卡中,选择“Storm UI 服务器” 。From the Summary tab, select Storm UI Server. 页面顶部会显示承载 Storm UI 和 REST API 的节点的 FQDN。The FQDN of the node that hosts the Storm UI and REST API is displayed at the top of the page.
Ambari REST APIAmbari REST API 使用 curl -u admin -G "https://CLUSTERNAME.azurehdinsight.cn/api/v1/clusters/CLUSTERNAME/services/STORM/components/STORM_UI_SERVER" 命令来检索有关 Storm UI 和 REST API 正在其上运行的节点的信息。Use the command curl -u admin -G "https://CLUSTERNAME.azurehdinsight.cn/api/v1/clusters/CLUSTERNAME/services/STORM/components/STORM_UI_SERVER" to retrieve information about the node that the Storm UI and REST API are running on. 请将两处出现的 CLUSTERNAME 替换为群集名称。Replace the two instances of CLUSTERNAME with the cluster name. 出现提示时,输入用户 (admin) 帐户的密码。When you're prompted, enter the password for the user (admin) account. 在响应中,JSON 输出的“host_name”条目包含节点的 FQDN。In the response, the "host_name" entry of the JSON output contains the FQDN of the node.

AuthenticationAuthentication

对 REST API 的请求必须使用基本身份验证,因此必须使用 HDInsight 群集的管理员名称和密码。 Requests to the REST API must use basic authentication, so you have to use the administrator name and password for the HDInsight cluster.

备注

由于基本身份验证使用明文发送,因此始终应该使用 HTTPS 来保护与群集之间的通信。Because basic authentication is sent by using clear text, you should always use HTTPS to secure communications with the cluster.

返回值Return values

从 REST API 返回的信息只能从该群集中使用。Information that is returned from the REST API may only be usable from within the cluster. 例如,无法从 Internet 访问针对 Apache ZooKeeper 服务器返回的完全限定的域名 (FQDN)。For example, the fully qualified domain name (FQDN) returned for Apache ZooKeeper servers isn't accessible from the internet.

后续步骤Next steps

了解如何使用 Apache Maven 开发基于 Java 的拓扑Learn how to Develop Java-based topologies using Apache Maven.

有关更多示例拓扑的列表,请参阅 Azure HDInsight 中的示例 Apache Storm 拓扑For a list of more example topologies, see Example Apache Storm topologies in Azure HDInsight.