方案:BindException - 地址已在 Azure HDInsight 中使用Scenario: BindException - Address already in use in Azure HDInsight

本文介绍在与 Azure HDInsight 群集交互时出现的问题的故障排除步骤和可能的解决方案。This article describes troubleshooting steps and possible resolutions for issues when interacting with Azure HDInsight clusters.

问题Issue

针对 Apache HBase 区域服务器执行的重启操作无法完成。The restart operation on an Apache HBase Region Server fails to complete. 从区域服务器启动失败的工作器节点上 /var/log/hbase 目录中的 region-server.log,可能会看到如下错误消息:From the region-server.log in /var/log/hbase directory on the worker nodes where region server start fails, you may see an error message similar as follows:

Caused by: java.net.BindException: Problem binding to /10.2.0.4:16020 : Address already in use
...

Caused by: java.net.BindException: Address already in use
...

原因Cause

在高工作负荷活动期间重启 Apache HBase 区域服务器。Restarting Apache HBase Region Servers during heavy workload activity. 以下是当用户通过 Apache Ambari UI 在 HBase 区域服务器上发起重启操作时,后台发生的情况:Below is what happens behind the scenes when a user initiates the restart operation on HBase region server's from Apache Ambari UI:

  1. Ambari 代理向区域服务器发送停止请求。The Ambari agent sends a stop request to the region server.

  2. Ambari 代理等待 30 秒让区域服务器正常关闭The Ambari agent waits for 30 seconds for the region server to shut down gracefully

  3. 如果应用程序继续与区域服务器进行连接,该服务器不会立即关闭。If your application continues to connect with the region server, the server won't shut down immediately. 在关闭之前,30 秒超时就会到期。The 30-second timeout expires before shutdown occurs.

  4. 30 秒之后,Ambari 代理向区域服务器发送强制终止 (kill -9) 命令。After 30 seconds, the Ambari agent sends a force-kill (kill -9) command to the region server.

  5. 由于此关闭很突然,尽管区域服务器进程已被终止,但与该进程关联的端口可能还没有释放,这最终会导致 AddressBindExceptionDue to this abrupt shutdown, although the region server process gets killed, the port associated with the process may not be released, which eventually leads to AddressBindException.

解决方法Resolution

在发起重启之前,减少 HBase 区域服务器上的负载。Reduce the load on the HBase region servers before initiating a restart. 另外,最好是先刷新所有表。Also, it's a good idea to first flush all the tables. 有关如何刷新表的参考信息,请参阅 HDInsight HBase:如何通过刷新表改进 Apache HBase 群集重启时间For a reference on how to flush tables, see HDInsight HBase: How to improve the Apache HBase cluster restart time by flushing tables.

或者,尝试使用以下命令,手动重启工作器节点上的区域服务器:Alternatively, try to manually restart region servers on the worker nodes using following commands:

sudo su - hbase -c "/usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh stop regionserver"
sudo su - hbase -c "/usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh start regionserver"

后续步骤Next steps

如果你的问题未在本文中列出,或者无法解决问题,请访问以下渠道以获取更多支持:If you didn't see your problem or are unable to solve your issue, visit the following channel for more support:

  • 如果需要更多帮助,可以从 Azure 门户提交支持请求。If you need more help, you can submit a support request from the Azure portal. 从菜单栏中选择“支持” ,或打开“帮助 + 支持” 中心。Select Support from the menu bar or open the Help + support hub.