Scenario: Cluster node runs out of disk space in Azure HDInsight

This article describes troubleshooting steps and possible resolutions for issues when interacting with Azure HDInsight clusters.

Issue

A job may fail with error message similar to: /usr/hdp/2.6.3.2-14/hadoop/libexec/hadoop-config.sh: fork: No space left on device.

Or you may receive Apache Ambari alert similar to: local-dirs usable space is below configured utilization percentage.

Apache Yarn application cache may have consumed all available disk space. Your Spark application is likely running inefficiently.

Use Ambari UI to determine which node is running out of disk space.
Determine which folder in the troubling node contributes to most of the disk space. SSH to the node first, then run df to list disk usage for all mounts. Usually it's /mnt that is a temp disk used by OSS. You can enter into a folder, then type sudo du -hs to show summarized file sizes under a folder. If you see a folder similar to /mnt/resource/hadoop/yarn/local/usercache/livy/appcache/application_1537280705629_0007, this output means the application is still running. This output could be due to RDD persistence or intermediate shuffle files.
To mitigate the issue, kill the application, which will release disk space used by that application.
If the issue happens frequently on the worker nodes, you can tune the YARN local cache settings on the cluster.

Open the Ambari UI Navigate to YARN --> Configs --> Advanced.
Add the following two properties to the custom yarn-site.xml section and save:
```
yarn.nodemanager.localizer.cache.target-size-mb=2048
yarn.nodemanager.localizer.cache.cleanup.interval-ms=300000
```
If the above doesn't permanently fix the issue, optimize your application.

If you didn't see your problem or are unable to solve your issue, visit one of the following channels for more support:

If you need more help, you can submit a support request from the Azure portal. Select Support from the menu bar or open the Help + support hub.