方案:Azure HDInsight Apache HBase 群集中区域服务器上的 CPU 使用率居高不下Scenario: Pegged CPU on region server in Apache HBase cluster in Azure HDInsight

本文介绍在与 Azure HDInsight 群集交互时出现的问题的故障排除步骤和可能的解决方法。This article describes troubleshooting steps and possible resolutions for issues when interacting with Azure HDInsight clusters.

问题Issue

Apache HBase 区域服务器进程开始占用接近 200% 的 CPU 使用率,导致 HBase Master 进程中激发警报,并且群集无法以完整容量正常运行。Apache HBase region server process starts occupying close to 200% CPU, causing alerts to fire on HBase Master process and cluster to not function at full capacity.

原因Cause

如果运行的是 HBase 群集 v3.4,则你可能遇到了将 JDK 升级到版本 1.7.0 _151 后出现的一个 bug。If you are running HBase cluster v3.4, you might have been hit by a potential bug caused by upgrade of jdk to version 1.7.0_151. 我们观察到的症状是,区域服务器进程开始占用接近 200% 的 CPU 使用率(若要验证,请运行命令 top;如果进程占用了接近 200% 的使用率,请运行 ps -aux | grep 获取其 PID,并确认它是否为区域服务器进程)。The symptom we see is region server process starts occupying close to 200% CPU (to verify this run the top command; if there is a process occupying close to 200% CPU get its pid and confirm it is region server process by running ps -aux | grep ).

解决方法Resolution

  1. 在群集的所有节点上按如下所示安装 JDK 1.8:Install jdk 1.8 on ALL nodes of the cluster as below:

    • 运行脚本操作 https://raw.githubusercontent.com/Azure/hbase-utils/master/scripts/upgradetojdk18allnodes.shRun the script action https://raw.githubusercontent.com/Azure/hbase-utils/master/scripts/upgradetojdk18allnodes.sh. 确保选择在所有节点上运行的选项。Be sure to select the option to run on all nodes.

    • 或者,可以登录到每个节点并运行命令 sudo add-apt-repository ppa:openjdk-r/ppa -y && sudo apt-get -y update && sudo apt-get install -y openjdk-8-jdkAlternatively, you can sign in to every individual node and run the command sudo add-apt-repository ppa:openjdk-r/ppa -y && sudo apt-get -y update && sudo apt-get install -y openjdk-8-jdk.

  2. 转到 Ambari UI - https://<clusterdnsname>.azurehdinsight.cnGo to Ambari UI - https://<clusterdnsname>.azurehdinsight.cn.

  3. 导航到“HBase”->“配置”->“高级”->“高级 hbase-env configs”,并将变量 JAVA_HOME 更改为 export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64Navigate to HBase->Configs->Advanced->Advanced hbase-env configs and change the variable JAVA_HOME to export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64. 保存配置更改。Save the config change.

  4. [可选但建议] 刷新群集上的所有表[Optional but recommended] Flush all tables on cluster.

  5. 同样在 Ambari UI 中,重启所有需要重启的 HBase 服务。From Ambari UI again, restart all HBase services that need restart.

  6. 群集进入稳定状态可能需要几分钟到长达一小时的时间,具体取决于群集上的数据。Depending on the data on cluster, it might take a few minutes to up to an hour for the cluster to reach stable state. 若要确认群集是否进入稳定状态,可以在 Ambari 中检查(刷新)HMaster UI(所有区域服务器都应处于活动状态),或者在头节点中运行 HBase shell,然后运行 status 命令。The way you confirm the cluster reaches stable state is by either checking HMaster UI (all region servers should be active) from Ambari (refresh) or from headnode run HBase shell and then run status command.

若要验证升级是否成功,请检查是否已使用适当的 Java 版本启动相关的 HBase 进程 - 例如,按如下所示检查区域服务器:To verify that your upgrade was successful, check that the relevant HBase processes are started using the appropriate java version - for instance for region server check as:

ps -aux | grep regionserver, and verify the version like '''/usr/lib/jvm/java-8-openjdk-amd64/bin/java

后续步骤Next steps

如果你的问题未在本文中列出,或者无法解决问题,请访问以下渠道获取更多支持:If you didn't see your problem or are unable to solve your issue, visit the following channel for more support:

  • 如果需要更多帮助,可以从 Azure 门户提交支持请求。If you need more help, you can submit a support request from the Azure portal. 从菜单栏中选择“支持” ,或打开“帮助 + 支持” 中心。Select Support from the menu bar or open the Help + support hub.