了解和解决从 HDInsight 上的 WebHCat 收到的错误Understand and resolve errors received from WebHCat on HDInsight

了解将 WebHCat 用于 HDInsight 时接收的错误以及如何解决错误。Learn about errors received when using WebHCat with HDInsight, and how to resolve them. WebHCat 由 Azure PowerShell 和 Data Lake Tools for Visual Studio 等客户端工具在内部使用。WebHCat is used internally by client-side tools such as Azure PowerShell and the Data Lake Tools for Visual Studio.

什么是 WebHCatWhat is WebHCat

WebHCat 是适用于 HCatalog 的 REST API,是针对 Apache Hadoop 的表和存储管理层。WebHCat is a REST API for HCatalog, a table, and storage management layer for Apache Hadoop. WebHCat 默认情况下在 HDInsight 群集上处于启用状态,可供各种工具在执行提交作业、获取作业状态等操作时使用,无需登录到群集中。WebHCat is enabled by default on HDInsight clusters, and is used by various tools to submit jobs, get job status, etc. without logging in to the cluster.

修改配置Modifying configuration

本文档中列出的几大错误之所以发生,是因为超出了配置的最大值。Several of the errors listed in this document occur because a configured maximum has been exceeded. 当解决步骤提到你可以更改某个值时,请使用 Apache Ambari(Web 或 REST API)来修改该值。When the resolution step mentions that you can change a value, use Apache Ambari (web or REST API) to modify the value. 有关详细信息,请参阅使用 Apache Ambari 管理 HDInsightFor more information, see Manage HDInsight using Apache Ambari

默认配置Default configuration

如果超过以下默认值,则可能降低 WebHCat 性能或导致错误:If the following default values are exceeded, it can degrade WebHCat performance or cause errors:

设置Setting 作用What it does 默认值Default value
yarn.scheduler.capacity.maximum-applicationsyarn.scheduler.capacity.maximum-applications 可以同时处于活动状态(挂起或运行)的最大作业数The maximum number of jobs that can be active concurrently (pending or running) 10,00010,000
templeton.exec.max-procstempleton.exec.max-procs 可以同时处理的最大请求数The maximum number of requests that can be served concurrently 20 个20
mapreduce.jobhistory.max-age-msmapreduce.jobhistory.max-age-ms 作业历史记录保留的天数The number of days that job history are retained 7 天7 days

请求过多Too many requests

HTTP 状态代码:429HTTP Status code: 429

原因Cause 解决方法Resolution
已超过 WebHCat 每分钟能够处理的最大并发请求数(默认值为 20)You have exceeded the maximum concurrent requests served by WebHCat per minute (default 20) 减少工作负载以确保提交的数量没有超出最大并发请求数,或者通过修改 templeton.exec.max-procs 来提高并发请求限制。Reduce your workload to ensure that you do not submit more than the maximum number of concurrent requests or increase the concurrent request limit by modifying templeton.exec.max-procs. 有关详细信息,请参阅修改配置For more information, see Modifying configuration

服务器不可用Server unavailable

HTTP 状态代码:503HTTP Status code: 503

原因Cause 解决方法Resolution
此状态代码通常发生在群集的主要和辅助 HeadNode 之间进行故障转移时This status code usually occurs during failover between the primary and secondary HeadNode for the cluster 等待两分钟,并重试该操作Wait two minutes, then retry the operation

错误的请求内容:找不到作业Bad request Content: Could not find job

HTTP 状态代码:400HTTP Status code: 400

原因Cause 解决方法Resolution
作业详细信息已被作业历史记录清除器清除Job details have been cleaned up by the job history cleaner 作业历史记录的默认保留期为 7 天。The default retention period for job history is 7 days. 通过修改 mapreduce.jobhistory.max-age-ms 可更改默认保留期。The default retention period can be changed by modifying mapreduce.jobhistory.max-age-ms. 有关详细信息,请参阅修改配置For more information, see Modifying configuration
作业因故障转移而终止Job has been killed due to a failover 重试提交作业,重试时间最多两分钟Retry job submission for up to two minutes
使用了无效的作业 IDAn Invalid job ID was used 检查作业 ID 是否正确Check if the job ID is correct

网关错误Bad gateway

HTTP 状态代码:502HTTP Status code: 502

原因Cause 解决方法Resolution
WebHCat 进程内发生内部垃圾回收Internal garbage collection is occurring within the WebHCat process 等待垃圾回收完成或重新启动 WebHCat 服务Wait for garbage collection to finish or restart the WebHCat service
等待 ResourceManager 服务的响应超时。Time out waiting on a response from the ResourceManager service. 当活动应用程序的数量达到配置的最大值(默认为 10,000)时,可能会发生此错误This error can occur when the number of active applications goes the configured maximum (default 10,000) 等待当前正在运行的作业完成,或者通过修改 yarn.scheduler.capacity.maximum-applications 来提高并发作业限制。Wait for currently running jobs to complete or increase the concurrent job limit by modifying yarn.scheduler.capacity.maximum-applications. 有关详细信息,请参阅修改配置部分。For more information, see the Modifying configuration section.
Fields 设置为 * 时,尝试通过 GET /jobs 调用来检索所有作业Attempting to retrieve all jobs through the GET /jobs call while Fields is set to * 不检索全部 作业详细信息。Do not retrieve all job details. 而是改用 jobid 来仅检索作业 ID 大于特定作业 ID 的作业的详细信息。或者,不使用 FieldsInstead use jobid to retrieve details for jobs only greater than certain job id. Or, do not use Fields
在 HeadNode 故障转移期间 WebHCat 服务关闭The WebHCat service is down during HeadNode failover 等待两分钟,并重试该操作Wait for two minutes and retry the operation
通过 WebHCat 提交的作业有超过 500 个处于挂起状态There are more than 500 pending jobs submitted through WebHCat 等到当前挂起的作业完成再提交更多作业Wait until currently pending jobs have completed before submitting more jobs