Apache Ambari heartbeat issues in Azure HDInsight
This article describes troubleshooting steps and possible resolutions for issues when interacting with Azure HDInsight clusters.
Ambari agent has high CPU utilization, which results in alerts from Ambari UI that for some nodes the Ambari agent heartbeat is lost. The heartbeat lost alert is usually transient.
Due to various ambari-agent bugs, in rare cases, your ambari-agent can have high (close to 100) percentage CPU utilization.
Identify process ID
PID
of ambari-agent:ps -ef | grep ambari_agent
Then run the following command to show CPU utilization:
top -p <ambari-agent-pid>
Restart ambari-agent to mitigate issue:
service ambari-agent restart
If restart does not work, kill the ambari-agent process and then start it up:
kill -9 <ambari-agent-pid> service ambari-agent start
Ambari agent hasn't started which results in alerts from Ambari UI that for some nodes the Ambari agent heartbeat is lost.
The alerts are caused by the Ambari agent not running.
Confirm status of ambari-agent:
service ambari-agent status
Confirm if failover controller services are running:
ps -ef | grep failover
If failover controller services aren't running, it's likely due to a problem prevent hdinsight-agent from starting failover controller. Check hdinsight-agent log from
/var/log/hdinsight-agent/hdinsight-agent.out
file.
Ambari heartbeat agent was lost.
OMS logs are causing high CPU utilization.
- Disable Azure Monitor logging using the Disable-AzHDInsightMonitoring PowerShell cmdlet.
- Delete the
mdsd.warn
log file
If you didn't see your problem or are unable to solve your issue, visit one of the following channels for more support:
- If you need more help, you can submit a support request from the Azure portal. Select Support from the menu bar or open the Help + support hub.