方案:Azure HDInsight 中的 Apache Ambari UI 502 错误Scenario: Apache Ambari UI 502 error in Azure HDInsight

本文介绍在与 Azure HDInsight 群集交互时出现的问题的故障排除步骤和可能的解决方法。This article describes troubleshooting steps and possible resolutions for issues when interacting with Azure HDInsight clusters.

问题Issue

尝试访问 HDInsight 群集的 Apache Ambari UI 时收到如下所示的消息:“502 - Web 服务器在充当网关或代理服务器时收到了无效响应。”When you try to access the Apache Ambari UI for your HDInsight cluster, you get a message similar to: "502 - Web server received an invalid response while acting as a gateway or proxy server."

原因Cause

通常,HTTP 502 状态代码表示 Ambari 服务器未在活动头节点上正常运行。In general, the HTTP 502 status code means that Ambari server is not running correctly on the active headnode. 此问题有多种可能的根本原因。There are a few possible root causes.

解决方法Resolution

在大多数情况下,若要缓解此问题,可以重启活动头节点。In most of the cases, to mitigate the problem, you can restart the active headnode. 或者,为头节点选择更大的 VM 大小。Or choose a larger VM size for your headnode.

Ambari 服务器无法启动Ambari server failed to start

可以检查 ambari-server 日志,以找出 Ambari 服务器无法启动的原因。You can check ambari-server logs to find out why Ambari server failed to start. 一个常见原因是数据库一致性检查出错。One common reason is the database consistency check error. 可在以下日志文件中找到相关信息:/var/log/ambari-server/ambari-server-check-database.logYou can find this out in this log file: /var/log/ambari-server/ambari-server-check-database.log.

如果对群集节点进行了任何修改,请撤消修改。If you made any modifications to the cluster node, please undo them. 始终使用 Ambari UI 来修改任何 Hadoop/Spark 相关的配置。Always use Ambari UI to modify any Hadoop/Spark related configurations.

Ambari 服务器的 CPU 使用率达到了 100%Ambari server taking 100% CPU utilization

在极少数情况下,我们会看到 ambari-server 进程的 CPU 利用率持续接近 100%。In rare situations, we’ve seen ambari-server process has close to 100% CPU utilization constantly. 作为缓解措施,可以通过 SSH 连接到活动头节点,终止 Ambari 服务器进程,然后将其重启。As a mitigation, you can ssh to the active headnode, and kill the Ambari server process and start it again.

ps -ef | grep AmbariServer
top -p <ambari-server-pid>
kill -9 <ambari-server-pid>
service ambari-server start

oom-killer 终止的 Ambari 服务器Ambari server killed by oom-killer

在某些情况下,头节点会耗尽内存,此时,Linux oom-killer 将会启动,以选取要终止的进程。In some scenarios, your headnode runs out of memory, and the Linux oom-killer starts to pick processes to kill. 可以通过搜索 AmbariServer 进程 ID(应该找不到它)来验证此情况。You can verify this situation by searching the AmbariServer process ID, which should not be found. 然后查找 /var/log/syslog,并查看如下所示的内容:Then look at your /var/log/syslog, and look for something like this:

Jul 27 15:29:30 xxx-xxxxxx kernel: [874192.703153] java invoked oom-killer: gfp_mask=0x23201ca, order=0, oom_score_adj=0

然后确定哪些进程占用了内存,并尝试进一步查找根本原因。Then identify which processes are taking memories and try to further root cause.

Ambari 服务器的其他问题Other issues with Ambari server

在极少的情况下,Ambari 服务器无法处理传入的请求,你可以通过查看 ambari-server 日志中的任何错误来了解详细信息。Rarely the Ambari server cannot handle the incoming request, you can find more info by looking at the ambari-server logs for any error. 此类情况的一个示例是出现如下所示的错误:One such case is an error like this:

Error Processing URI: /api/v1/clusters/xxxxxx/host_components - (java.lang.OutOfMemoryError) Java heap space

后续步骤Next steps

如果你的问题未在本文中列出,或者无法解决问题,请访问以下渠道以获取更多支持:If you didn't see your problem or are unable to solve your issue, visit the following channel for more support: