Azure HDInsight 中 Apache Ambari 的用法Apache Ambari usage in Azure HDInsight

HDInsight 使用 Apache Ambari 来部署和管理群集。HDInsight uses Apache Ambari for cluster deployment and management. Ambari 代理在每个节点(头节点、工作器节点、Zookeeper 和边缘节点(如果存在))上运行。Ambari agents run on every node (headnode, worker node, zookeeper, and edgenode if exists). Ambari 服务器仅在头节点(hn0 或 hn1)上运行。Ambari server runs only on headnode (hn0 or hn1). 一次只能运行 Ambari 服务器的一个实例。Only one instance of Ambari server shall run at one time. 运行的实例数由 HDInsight 故障转移控制器控制。This is controlled by HDInsight failover controller. 当一个头节点由于重新启动或维护而关闭时,另一个头节点将成为活动头节点,并启动另一个头节点上的 Ambari 服务器。When one of the headnodes is down for reboot or maintenance, the other headnode will become active and Ambari server on the second headnode will be started.

所有群集配置应通过 Ambari UI 完成,重启节点时会覆盖任何本地更改。All cluster configuration should be done through the Ambari UI, any local change will be overwritten when the node is restarted.

故障转移控制器服务Failover controller services

HDInsight 故障转移控制器还负责更新头节点主机的 IP 地址,该地址指向当前活动头节点。The HDInsight failover controller is also responsible for updating the IP address of headnode host, which points to the current active head node. 所有 Ambari 代理配置为向头节点主机报告其状态和检测信号。All Ambari agents are configured to report its state and heartbeat to headnode host. 故障转移控制器是群集中每个节点上运行的一组服务,如果这些服务未运行,头节点故障转移可能无法正常进行,并且在尝试访问 Ambari 服务器时,会出现 HTTP 502 错误。The failover controller is a set of services running on every node in the cluster, if they aren't running, the headnode failover may not work correctly and you'll end up with HTTP 502 when trying to access Ambari server.

若要检查哪个头节点处于活动状态,方法之一是通过 SSH 连接到群集中的某个节点,然后运行 ping headnodehost,并将返回的 IP 与这两个头节点的 IP 进行比较。To check which headnode is active, one way is to ssh to one of the nodes in the cluster, then run ping headnodehost and compare the IP with that of the two headnodes.

如果故障转移控制器服务未运行,头节点故障转移可能不会正常进行,因而无法运行 Ambari 服务器。If failover controller services aren't running, headnode failover may not happen correctly, which may end up not running Ambari server. 若要检查故障转移控制器服务是否正在运行,请执行:To check if failover controller services are running, execute:

ps -ef | grep failover

日志Logs

在活动头节点上,可以检查位于以下位置的 Ambari 服务器日志:On the active headnode, you can check the Ambari server logs at:

/var/log/ambari-server/ambari-server.log
/var/log/ambari-server/ambari-server-check-database.log

在群集中的任一节点上,可以检查位于以下位置的 Ambari 代理日志:On any node in the cluster, you can check the Ambari agent logs at:

/var/log/ambari-agent/ambari-agent.log

服务启动顺序Service start sequences

这是服务在引导时的启动顺序:This is the sequence of service start at boot time:

  1. Hdinsight 代理启动故障转移控制器服务。Hdinsight-agent starts failover controller services.
  2. 故障转移控制器服务启动每个节点上的 Ambari 代理,并启动活动头节点上的 Ambari 服务器。Failover controller services start Ambari agent on every node and Ambari server on active headnode.

Ambari 数据库Ambari Database

HDInsight 幕后会在 SQL 数据库中创建数据库,用作 Ambari 服务器的数据库。HDInsight creates a database in SQL Database under the hood to serve as the database for Ambari server. 默认的服务层级是 S0The default service tier is S0.

对于在创建时其工作器节点数超过 16 个的任何群集,数据库服务层级为 S2。For any cluster with worker node count bigger than 16 when creating the cluster, S2 is the database service tier.

要点Takeaway points

切勿手动启动/停止 Ambari 服务器或 Ambari 代理服务,除非你要尝试重启这些服务来解决某个问题。Never manually start/stop ambari-server or ambari-agent services, unless you're trying to restart the service to work around an issue. 若要强制故障转移,可以重新启动活动头节点。To force a failover, you can reboot the active headnode.

切勿手动修改任何群集节点上的任何配置文件,应该让 Ambari UI 完成此类作业。Never manually modify any configuration files on any cluster node, let Ambari UI do the job for you.

后续步骤Next steps

如果你的问题未在本文中列出,或者无法解决问题,请访问以下渠道以获取更多支持:If you didn't see your problem or are unable to solve your issue, visit the following channel for more support: