从 Azure HDInsight 中的 Apache Ambari Hive 视图运行查询时发生异常Exception when running queries from Apache Ambari Hive View in Azure HDInsight

本文介绍在与 Azure HDInsight 群集交互时出现的问题的故障排除步骤和可能的解决方案。This article describes troubleshooting steps and possible resolutions for issues when interacting with Azure HDInsight clusters.

问题Issue

从 Apache Ambari Hive 视图运行 Apache Hive 查询时,不时地收到以下错误消息:When running an Apache Hive query from Apache Ambari Hive View, you receive the following error message intermittently:

Cannot create property 'errors' on string '<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Strict//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'>
<html xmlns='http://www.w3.org/1999/xhtml'>
<head>
<title>IIS 8.5 Detailed Error - 502.3 - Bad Gateway</title>

原因Cause

网关超时。A Gateway timeout.

网关超时值为 2 分钟。The Gateway timeout value is 2 minutes. 来自 Ambari Hive 视图的查询将通过网关提交到 /hive2 终结点。Queries from Ambari Hive View are submitted to the /hive2 endpoint through the gateway. 成功编译并接受查询后,HiveServer 将返回 queryidOnce the query is successfully compiled and accepted, the HiveServer returns a queryid. 然后,客户端将不断轮询该查询的状态。Clients then keep polling for the status of the query. 在此过程中,如果 HiveServer 在 2 分钟内未返回 HTTP 响应,则 HDI 网关会向调用方引发 502.3 网关超时错误。During this process, if the HiveServer doesn't return an HTTP response within 2 minutes, the HDI Gateway throws a 502.3 Gateway timeout error to the caller. 当查询已提交进行处理(很有可能)并位于“获取状态”调用中(不太可能)时,可能会发生这些错误。The errors could happen when the query is submitted for processing (more likely) and also in the get status call (less likely). 用户可以查看查询的任一状态。Users could see either of them.

HTTP 处理程序线程的速度预期很快:只需准备作业并返回 queryidThe http handler thread is supposed to be quick: prepare the job and return a queryid. 但是,由于多种原因,所有处理程序线程可能非常繁忙,导致新查询和“获取状态”调用超时。However, due to several reasons, all the handler threads could be busy resulting in timeouts for new queries and the get status calls.

HTTP 处理程序线程的责任Responsibilities of the HTTP handler thread

当客户端将查询提交到 HiveServer 时,会在前台线程中执行以下操作:When the client submits a query to HiveServer, it does the following in the foreground thread:

  • 分析请求,执行语义验证Parse the request, do semantic verification
  • 获取锁Acquire lock
  • 执行元存储查找(如有必要)Metastore lookup if necessary
  • 编译查询(DDL 或 DML)Compile the query (DDL or DML)
  • 准备查询计划Prepare a query plan
  • 执行授权(在安全群集中运行所有适用的 Ranger 策略)Perform authorization (Run all applicable ranger policies in secure clusters)

解决方法Resolution

可以遵循一些常规建议来改善这种情况:Some general recommendations to you to improve the situation:

  • 如果使用外部 Hive 元存储,请检查数据库指标,并确保数据库未过载。If using an external hive metastore, check the DB metrics and make sure that the database isn't overloaded. 考虑缩放元存储数据库层。Consider scaling the metastore database layer.

  • 确保已启用并行操作(使 HTTP 处理程序线程能够并行运行)。Ensure that parallel ops is turned on (this enables the HTTP handler threads to run in parallel). 若要验证该值,请启动 Apache Ambari 并导航到“Hive” > “配置” > “高级” > “自定义 hive-site”。 To verify the value, launch Apache Ambari and navigate to Hive > Configs > Advanced > Custom hive-site. hive.server2.parallel.ops.in.session 的值应是 trueThe value for hive.server2.parallel.ops.in.session should be true.

  • 确保群集的 VM SKU 对于负载而言不会太小。Ensure that the cluster's VM SKU isn't too small for the load. 考虑将工作负载分散到多个群集。Consider to splitting the work among multiple clusters. 有关详细信息,请参阅选择群集类型For more information, see Choose a cluster type.

  • 如果在群集上安装了 Ranger,请检查是否需要为每个查询评估过多的 Ranger 策略。If Ranger is installed on the cluster, please check if there are too many Ranger policies that need to be evaluated for each query. 找出重复或不需要的策略。Look for duplicate or unneeded policies.

  • 在 Ambari 中检查“HiveServer2 堆大小”值。 Check the HiveServer2 Heap Size value from Ambari. 导航到“Hive” > “配置” > “设置” > “优化”。 .Navigate to Hive > Configs > Settings > Optimization. 确保该值大于 10 GB。Make sure the value is larger than 10 GB. 根据需要进行调整,以优化性能。Adjust as needed to optimize performance.

  • 确保 Hive 查询经过适当的优化。Ensure the Hive query is well tuned. 有关详细信息,请参阅在 Azure HDInsight 中优化 Apache Hive 查询For more information, see Optimize Apache Hive queries in Azure HDInsight.

后续步骤Next steps

如果你的问题未在本文中列出,或者无法解决问题,请访问以下渠道以获取更多支持:If you didn't see your problem or are unable to solve your issue, visit the following channel for more support:

  • 如果需要更多帮助,可以从 Azure 门户提交支持请求。If you need more help, you can submit a support request from the Azure portal.