How to custom configure HDInsight Autoscale
Following are few configurations that can be tuned to custom configure HDInsight Autoscale as per customer needs.
Note
This is applicable for 4.0 and 5.0 stacks.
Configuration | Description | Default value | Applicable cluster/Autoscale type | Remarks |
---|---|---|---|---|
yarn.4_0.graceful.decomm.workaround.enable | Enable YARN graceful decommissioning | Loadware autoscale - True Scheduled autoscale - True | Hadoop/Spark | If this config is disabled, YARN puts nodes in decommissioned state directly from running state without waiting for the applications using the node to finish. This action might lead to applications getting killed abruptly when nodes are decommissioned. Read more about job resiliency in YARN here |
yarn.graceful.decomm.timeout | YARN graceful decommissioning timeout in seconds | Hadoop Loadware - 3600 Spark Scheduled - 1 Hadoop Scheduled - 1 Spark Loadware - 86400 | Hadoop/Spark | Graceful decommissioning timeout is best configured according to customer applications. For example - if an application has many mappers and few reducers, which can take 4 hours to complete, this configuration needs to be set to more than 4 hours |
yarn.max.scale.up.increment | Maximum number of nodes to scale up in one go | 200 | Hadoop/Spark/Interactive Query | It has been tested with 200 nodes. We don't recommend setting this value to more than 200. It can be set to less than 200 if the customer wants less aggressive scale up |
yarn.max.scale.down.increment | Maximum number of nodes to scale up in one go | 50 | Hadoop/Spark/Interactive Query | Can be set to up to 100 |
nodemanager.recommission.enabled | Feature to enabled recommissioning of decommissioning NMs before adding new nodes to the cluster | True | Hadoop/Spark load based autoscale | Disabling this feature can cause underutilization of cluster. There can be nodes in decommissioning state, which have no containers to run but are waiting for application to finish, even if there's more load in the cluster. Note: Applicable for images on 2304280205 or later |
UnderProvisioningDiagnoser.time.ms | Time in milliseconds for which cluster needs to under provisioned for a scale up to trigger | 180000 | Hadoop/Spark load based autoscaling | - |
OverProvisioningDiagnoser.time.ms | Time in milliseconds for which cluster needs to be overprovisioned for a scale down to trigger | 180000 | Hadoop/Spark load based autoscaling | - |
hdfs.decommission.enable | Decommission data nodes before triggering decommissioning node managers. HDFS doesn't support any graceful decommission timeout, it's immediate | True | Hadoop/Spark load based autoscaling | Decommissioning datanodes before decommissioning nodemanagers so that particular datanode isn't used for storing shuffle data. |
scaling.recommission.cooldown.ms | Cooldown period after recommission during which no metrics are sampled | 120000 | Hadoop/Spark load based autoscaling | This cooldown period ensures the cluster has some time to redistribute the load to the newly recommissioned nodemanagers . Note: Applicable for images on 2304280205 or later |
scale.down.nodes.with.ms | Scale down nodes where an AM is running | false | Hadoop/Spark | Can be turned on if there are enough reattempts configured for the AM. Useful for cases where there are long running applications (example spark streaming) which can be killed for scaling down cluster if load has reduced. Note: Applicable for images on 2304280205 or later |
Note
- The above configs can be changed using this script run on the headnodes as a script action, please use this readme to understand how to run the script.
- Customers are advised to test the configurations on lower environments before moving to production.
- How to check image version
Read about guidelines for scaling clusters manually in Scaling guidelines