方案:Apache Hive 中的联接导致 Azure HDInsight 中出现 OutOfMemory 错误Scenario: Joins in Apache Hive leads to an OutOfMemory error in Azure HDInsight

本文介绍在 Azure HDInsight 群集中使用交互式查询组件时出现的问题的故障排除步骤和可能的解决方案。This article describes troubleshooting steps and possible resolutions for issues when using Interactive Query components in Azure HDInsight clusters.

问题Issue

Apache Hive 联接的默认行为是将表的全部内容加载到内存中,以便无需执行 Map/Reduce 步骤即可执行联接。The default behavior for Apache Hive joins is to load the entire contents of a table into memory so that a join can be performed without having to perform a Map/Reduce step. 如果 Hive 表太大而无法放入内存中,则查询可能会失败。If the Hive table is too large to fit into memory, the query can fail.

原因Cause

在足够大的 Hive 中运行联接时,会遇到以下错误:When running joins in hive of sufficient size, the following error is encountered:

Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded error.

解决方法Resolution

通过设置以下 Hive 配置值,防止 Hive 在联接时将表加载到内存中(而不是执行 Map/Reduce 步骤):Prevent Hive from loading tables into memory on joins (instead performing a Map/Reduce step) by setting the following Hive configuration value:

hive.auto.convert.join=false

后续步骤Next steps

如果设置此值不能解决问题,请访问以下项。If setting this value didn't resolve your issue, visit the following...

  • 如果需要更多帮助,可以从 Azure 门户提交支持请求。If you need more help, you can submit a support request from the Azure portal.