场景:hbase hbck 命令返回 Azure HDInsight 中不一致的结果Scenario: hbase hbck command returns inconsistencies in Azure HDInsight

本文介绍在与 Azure HDInsight 群集交互时出现的问题的故障排除步骤和可能的解决方法。This article describes troubleshooting steps and possible resolutions for issues when interacting with Azure HDInsight clusters.

问题:区域不在 hbase:metaIssue: Region is not in hbase:meta

区域 xxx 位于 HDFS 上,但未在 hbase:meta 中列出,或未在任何区域服务器上部署。Region xxx on HDFS, but not listed in hbase:meta or deployed on any region server.

原因Cause

多种多样。Varies.

解决方法Resolution

  1. 运行以下命令修复元表:Fix the meta table by running:

    hbase hbck -ignorePreCheckPermission –fixMeta
    
  2. 运行以下命令将区域分配到 RegionServer:Assign regions to RegionServers by running:

    hbase hbck -ignorePreCheckPermission –fixAssignment
    

问题:区域已脱机Issue: Region is offline

区域 xxx 未部署在任何 RegionServer 上。Region xxx not deployed on any RegionServer. 这意味着区域在 hbase:meta 中,但已脱机。This means the region is in hbase:meta, but offline.

原因Cause

多种多样。Varies.

解决方法Resolution

运行以下命令,使区域联机:Bring regions online by running:

hbase hbck -ignorePreCheckPermission –fixAssignment

问题:区域具有相同的开始/结束键Issue: Regions have the same start/end keys

原因Cause

多种多样。Varies.

解决方法Resolution

手动合并这些重叠的区域。Manually merge those overlapped regions. 转到 HBase HMaster Web UI 表部分,选择有问题的表链接。Go to HBase HMaster Web UI table section, select the table link, which has the issue. 你将看到属于该表的每个区域的开始键/结束键。You will see start key/end key of each region belonging to that table. 然后合并这些重叠的区域。Then merge those overlapped regions. 在 HBase shell 中执行 merge_region 'xxxxxxxx','yyyyyyy', trueIn HBase shell, do merge_region 'xxxxxxxx','yyyyyyy', true. 例如:For example:

RegionA, startkey:001, endkey:010,

RegionB, startkey:001, endkey:080,

RegionC, startkey:010, endkey:080.

在这种情况下,需要合并 RegionA 和 RegionC,获取键范围与 RegionB 相同的 RegionD,然后合并 RegionB 和 RegionD。In this scenario, you need to merge RegionA and RegionC and get RegionD with the same key range as RegionB, then merge RegionB and RegionD. xxxxxxx 和 yyyyyy 是每个区域名称末尾的哈希字符串。xxxxxxx and yyyyyy are the hash string at the end of each region name. 此处请注意,不要合并两个非连续区域。Be careful here not to merge two discontinuous regions. 每次合并后(例如,合并 A 和 C),HBase 将在 RegionD 上启动压缩。After each merge, like merge A and C, HBase will start a compaction on RegionD. 等待压缩完成,然后与 RegionD 进行另一次合并。Wait for the compaction to finish before doing another merge with RegionD. 可以在 HBase HMaster UI 中该区域服务器的页面上找到压缩状态。You can find the compaction status on that region server page in HBase HMaster UI.


问题:无法加载 .regioninfoIssue: Can't load .regioninfo

无法加载区域 /hbase/data/default/tablex/regiony.regioninfoCan't load .regioninfo for region /hbase/data/default/tablex/regiony.

原因Cause

原因很可能是 RegionServer 崩溃或 VM 重新启动时部分删除了区域。This is most likely due to region partial deletion when RegionServer crashes or VM reboots. 目前,Azure 存储是一个平面 Blob 文件系统,某些文件操作不是原子性的。Currently, the Azure Storage is a flat blob file system and some file operations are not atomic.

解决方法Resolution

手动清理这些剩余文件和文件夹:Manually clean up these remaining files and folders:

  1. 执行 hdfs dfs -ls /hbase/data/default/tablex/regiony 检查其中仍然包含哪些文件夹/文件。Execute hdfs dfs -ls /hbase/data/default/tablex/regiony to check what folders/files are still under it.

  2. 执行 hdfs dfs -rmr /hbase/data/default/tablex/regiony/filez 删除所有子文件/文件夹。Execute hdfs dfs -rmr /hbase/data/default/tablex/regiony/filez to delete all child files/folders

  3. 执行 hdfs dfs -rmr /hbase/data/default/tablex/regiony 删除区域文件夹。Execute hdfs dfs -rmr /hbase/data/default/tablex/regiony to delete the region folder.


后续步骤Next steps

如果你的问题未在本文中列出,或者无法解决问题,请访问以下渠道获取更多支持:If you didn't see your problem or are unable to solve your issue, visit the following channel for more support:

  • 如果需要更多帮助,可以从 Azure 门户提交支持请求。If you need more help, you can submit a support request from the Azure portal. 从菜单栏中选择“支持” ,或打开“帮助 + 支持” 中心。Select Support from the menu bar or open the Help + support hub.