Manage Entra ID-enabled Azure HDInsight clusters with ARM templates

Azure HDInsight now supports integration with Microsoft Entra ID, enabling secure, role-based access control and simplified identity management for your big data workloads. By using Azure Resource Manager (ARM) templates, you can deploy and manage Entra ID-enabled HDInsight clusters in a repeatable, automated, and consistent way. In this guide, you'll learn how to set up ARM template resources and configure Entra ID. You'll also see how to manage clusters at scale while meeting your organization's security and governance rules.

Prerequisites

Before you begin, ensure you have the following resources:

  • Azure subscription with sufficient permissions to deploy resources.

  • Microsoft Entra ID tenant configured and accessible.

  • Service principal or managed identity with the necessary role assignments (for example, Contributor and User Access Administrator) to create and manage HDInsight clusters.

  • Azure CLI, PowerShell, or Azure portal access to validate deployment and monitor cluster status.

  • ARM template and parameter files that define the cluster configuration and Entra ID integration settings.

ARM template resource definition

The clusters resource type can be deployed with operations that target Resource groups - See resource group deployment commands

For a list of changed properties in each API version, see change log.

Resource format

To create a Microsoft.HDInsight/clusters resource, add the following JSON to your template.

				{
					"$schema": "https://schema.management.azure.com/schemas/2014-04-01-preview/deploymentTemplate.json#",
					"contentVersion": "0.9.0.0",
					"parameters": {
						"clusterName": {
							"type": "string",
							"metadata": {
								"description": "The name of the HDInsight cluster to create."
							}
						},
						"location": {
							"type": "string",
							"defaultValue": "chinaeast2euap",
							"metadata": {
								"description": "The location where all azure resources will be deployed."
							}
						},
						"clusterVersion": {
							"type": "string",
							"defaultValue": "5.1",
							"metadata": {
								"description": "HDInsight cluster version."
							}
						},
						"clusterWorkerNodeCount": {
							"type": "int",
							"defaultValue": 4,
							"metadata": {
								"description": "The number of nodes in the HDInsight cluster."
							}
						},
						"clusterKind": {
							"type": "string",
							"defaultValue": "SPARK",
							"metadata": {
								"description": "The type of the HDInsight cluster to create."
							}
						},
						"sshUserName": {
							"type": "string",
							"defaultValue": "sshuser",
							"metadata": {
								"description": "These credentials can be used to remotely access the cluster."
							}
						},
						"clusterRestAuthEntraUsers": {
							"type": "string",
							"metadata": {
								"description": "These Micrsoft Entra users can be used to submit jobs to the cluster and to log into cluster dashboards."
							}
						},
						"sshPassword": {
							"type": "securestring",
							"metadata": {
								"description": "The password must be at least 10 characters in length and must contain at least one digit, one non-alphanumeric character, and one upper or lower case letter."
							}
						},
						"minTlsVersionNumber": {
							"type": "string"
						},
						"isEncryptionInTransitEnabled": {
							"type": "bool"
						}
					},
					"resources": [
						{
							"apiVersion": "2025-01-15-preview",
							"name": "[parameters('clusterName')]",
							"type": "Microsoft.HDInsight/clusters",
							"location": "[parameters('location')]",
							"dependsOn": [],
							"tags": {},
							"zones": null,
							"properties": {
								"clusterVersion": "[parameters('clusterVersion')]",
								"osType": "Linux",
								"tier": "standard",
								"clusterDefinition": {
									"kind": "[parameters('clusterKind')]",
									"componentVersion": {
										"Spark": "3.3"
									},
									"configurations": {
										"gateway": {
											"restAuthCredential.isEnabled": false,
											"restAuthEntraUsers": "[parameters('clusterRestAuthEntraUsers')]"
										}
									}
								},
								"storageProfile": {
									"storageaccounts": [
										{
											"name": "<storageAccountName>.dfs.core.chinacloudapi.cn",
											"isDefault": true,
											"fileSystem": "hmgespark1-2025-10-06t08-34-17-604z",
											"resourceId": "/subscriptions/SubscriptionID>/resourceGroups/<resourceGroupName>/providers/Microsoft.Storage/storageAccounts/<MSIname>",
											"msiResourceId": "/subscriptions/<SubscriptionID>/resourcegroups/<resourceGroupName>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<MSIname>",
											"enableSecureChannel": true
										}
									]
								},
								"computeProfile": {
									"roles": [
										{
											"autoscale": null,
											"name": "headnode",
											"minInstanceCount": 1,
											"targetInstanceCount": 2,
											"hardwareProfile": {
												"vmSize": "Standard_E8_V3"
											},
											"osProfile": {
												"linuxOperatingSystemProfile": {
													"username": "[parameters('sshUserName')]",
													"password": "[parameters('sshPassword')]"
												},
												"windowsOperatingSystemProfile": null
											},
											"virtualNetworkProfile": null,
											"scriptActions": [],
											"dataDisksGroups": null
										},
										{
											"autoscale": {
												"capacity": {
													"minInstanceCount": 4,
													"maxInstanceCount": 5
												},
												"recurrence": null
											},
											"name": "workernode",
											"targetInstanceCount": 4,
											"hardwareProfile": {
												"vmSize": "Standard_E8_V3"
											},
											"osProfile": {
												"linuxOperatingSystemProfile": {
													"username": "[parameters('sshUserName')]",
													"password": "[parameters('sshPassword')]"
												},
												"windowsOperatingSystemProfile": null
											},
											"virtualNetworkProfile": null,
											"scriptActions": [],
											"dataDisksGroups": null
										},
										{
											"autoscale": null,
											"name": "zookeepernode",
											"minInstanceCount": 1,
											"targetInstanceCount": 3,
											"hardwareProfile": {
												"vmSize": "Standard_A2_V2"
											},
											"osProfile": {
												"linuxOperatingSystemProfile": {
													"username": "[parameters('sshUserName')]",
													"password": "[parameters('sshPassword')]"
												},
												"windowsOperatingSystemProfile": null
											},
											"virtualNetworkProfile": null,
											"scriptActions": [],
											"dataDisksGroups": null
										}
									]
								},
								"minSupportedTlsVersion": "[parameters('minTlsVersionNumber')]",
								"encryptionInTransitProperties": {
									"isEncryptionInTransitEnabled": "[parameters('isEncryptionInTransitEnabled')]"
								}
							},
							"identity": {
								"type": "UserAssigned",
								"userAssignedIdentities": {
									"/subscriptions/<subcriptionID>/resourcegroups/<resourceGroupName>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<MSIname>": {}
								}
							}
						}
					]
				}

Parameters

     				{
				  "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
				  "contentVersion": "1.0.0.0",
				  "parameters": {
					"clusterName": {
					  "value": "<clustername>"
					},
					"location": {
					  "value": "<region_name>"
					},
					"clusterVersion": {
					  "value": "5.1"
					},
					"clusterWorkerNodeCount": {
					  "value": 4
					},
					"clusterKind": {
					  "value": "SPARK"
					},
					"sshUserName": {
					  "value": "sshuser"
					},
					"clusterRestAuthEntraUsers": {
					  "value": "[{\"displayName\":\"<Name>\",\"objectId\":\"00000000-0000-0000-0000-1ed7871c38e0\",\"upn\":\"user2@contoso.com\"},{\"displayName\":\"<Name>\",\"objectId\":\"00000000-0000-0000-000-b44d6570aa30\",\"upn\":\"user1@contoso.com\"}]"
					},
					"sshPassword": {
					  "value": null
					},
					"minTlsVersionNumber": {
					  "value": "1.2"
					},
					"isEncryptionInTransitEnabled": {
					  "value": true
					}
				  }
				}

Property Values

Autoscale

Name Description Value
capacity Parameters for load-based autoscale AutoscaleCapacity
recurrence Parameters for schedule-based autoscale AutoscaleRecurrence

AutoscaleCapacity

Name Description Value
maxInstanceCount Array of schedule-based autoscale rules int
timeZone The time zone for the autoscale schedule times int

AutoscaleRecurrence

Name Description Value
schedule Array of schedule-based autoscale rules AutoscaleSchedule
timeZone The time zone for the autoscale schedule times string

AutoscaleSchedule

Name Description Value
days Days of the week for a schedule-based autoscale rule String array containing any of:'Friday' / 'Monday' / 'Saturday' / 'Sunday' / 'Thursday' / 'Tuesday' / 'Wednesday'
timeAndCapacity Time and capacity for a schedule-based autoscale rule AutoscaleTimeAndCapacity

AutoscaleTimeAndCapacity

Name Description Value
maxInstanceCount The maximum instance count of the cluster int
minInstanceCount The minimum instance count of the cluster int
time 24-hour time in the form HH:MM string

ClusterCreatePropertiesOrClusterGetProperties

Name Description Value
clusterDefinition The cluster definition. ClusterDefinition
clusterVersion The version of the cluster. string
computeIsolationProperties The compute isolation properties. ComputeIsolationProperties
computeProfile The compute profile. ComputeProfile
diskEncryptionProperties The disk encryption properties. DiskEncryptionProperties
encryptionInTransitProperties The encryption-in-transit properties. EncryptionInTransitProperties
kafkaRestProperties The cluster kafka rest proxy configuration. KafkaRestProperties
minSupportedTlsVersion The minimal supported TLS version. string
networkProperties The network properties. NetworkProperties
osType The type of operating system. 'Linux' / 'Windows'
privateLinkConfigurations The private link configurations. PrivateLinkConfiguration[]
securityProfile The security profile. SecurityProfile
storageProfile The storage profile. StorageProfile
tier The cluster tier. 'Premium' / 'Standard'

ClusterDefinition

Name Description Value
blueprint The link to the blueprint. string
componentVersion The versions of different services in the cluster. ClusterDefinitionComponentVersion
configurations The cluster configurations. any
kind The type of cluster. string

ClusterIdentity

Name Description Value
type The type of identity used for the cluster. The type 'SystemAssigned, UserAssigned' includes both an implicitly created identity and a set of user assigned identities. 'None' / 'SystemAssigned' / 'UserAssigned'
userAssignedIdentities The list of user identities associated with the cluster. The user identity dictionary key references are ARM resource IDs in the form: '/subscriptions/{subscriptionID}/resourceGroups/{resourceGroupName}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/{identityName}'. ClusterIdentityUserAssignedIdentities

ComputeIsolationProperties

Name Description Value
enableComputeIsolation The flag indicates whether enable compute isolation or not. bool
hostSku The host sku. string

ComputeProfile

Name Description Value
roles The list of roles in the cluster. Role []

DataDisksGroups

Name Description Value
disksPerNode The number of disks per node. int

DiskEncryptionProperties

Name Description Value
encryptionAlgorithm Algorithm identifier for encryption, default RSA-OAEP. 'RSA-OAEP' 'RSA-OAEP-256' 'RSA1_5'
encryptionAtHost Indicates whether or not resource disk encryption is enabled. bool
keyName Key name that is used for enabling disk encryption. string
keyVersion Specific key version that is used for enabling disk encryption. string
msiResourceId Resource ID of Managed Identity that is used to access the key vault. string
vaultUri Base key vault URI where the customer's key is located for example. https://myvault.vault.azure.cn string

EncryptionInTransitProperties

Name Description Value
isEncryptionInTransitEnabled Indicates whether or not inter cluster node communication is encrypted in transit. bool

Hardware Profile

Name Description Value
vmSize The size of the Virtual Machine string

IP Configuration

Name Description Value
name The name of private link IP configuration. string (required)
properties The private link ip configuration properties. IPConfigurationProperties

IPConfigurationProperties

Name Description Value
primary Indicates whether this IP configuration is primary for the corresponding NIC. bool
privateIPAddress The IP address. string
privateIPAllocationMethod The method that private IP address is allocated. 'dynamic' 'static'
subnet The subnet resource ID. ResourceID

IpTag

Name Description Value
ipTagType Gets or sets the ipTag type: Example FirstPartyUsage. string (required)
tag Gets or sets value of the IpTag associated with the public IP. Example HDInsight, SQL, Storage, etc., string (required)

KafkaRestProperties

Name Description Value
clientGroupInfo The information of Microsoft Entra ID security group. ClientGroupInfo
configurationOverride The configurations that need to be overridden. KafkaRestPropertiesConfigurationOverride

LinuxOperatingSystemProfile

Name Description Value
password The password. string
sshProfile The SSH profile. SshProfile
username The username. string

Microsoft.HDInsight/clusters

Name Description Value
apiVersion The API version. '2025-01-15-preview'
identity The identity of the cluster, if configured. ClusterIdentity
location The location of the cluster. string
name The resource name. string (required)
properties The clusters create parameters. ClusterCreatePropertiesOrClusterGetProperties
tags Resource tags. Dictionary of tag names and values. See Tags in templates
type The resource type. 'Microsoft.HDInsight/clusters'
zones The availability zones. string[]

NetworkProperties

Name Description Value
outboundDependenciesManagedType A value to describe how the outbound dependencies of a HDInsight cluster are managed. 'Managed' means the HDInsight service manages the outbound dependencies. 'External' means a customer-specific solution manages the outbound dependencies. 'External' 'Managed'
privateLink Indicates whether or not private link is enabled. 'Disabled' 'Enabled'
publicIpTag Gets or sets the IP tag for the public IPs created along with the HDInsight Clusters. IpTag
resourceProviderConnection The direction for the resource provider connection. 'Inbound' 'Outbound'

OsProfile

Name Description Value
linuxOperatingSystemProfile The Linux OS profile. LinuxOperatingSystemProfile

PrivateLinkConfiguration

Name Description Value
name The name of private link configuration. string (required)
properties The private link configuration properties. PrivateLinkConfigurationProperties (required)

PrivateLinkConfigurationProperties

Name Description Value
groupID The HDInsight private linkable subresource name to apply the private link configuration to. For example, 'headnode'/ 'gateway'/ 'edgenode'. string (required)
ipConfigurations The IP configurations for the private link service. IPConfiguration[] (required)

ResourceID

Name Description Value
ID The Azure resource ID. string

Role

Name Description Value
autoscale The autoscale configurations. {Autoscale}(/azure/templates/microsoft.hdinsight/clusters?pivots=deployment-language-arm-template#autoscale-1)
dataDisksGroups The data disks groups for the role. DataDisksGroups []
encryptDataDisks Indicates whether encrypt the data disks. bool
hardwareProfile The hardware profile. HardwareProfile
minInstanceCount The minimum instance count of the cluster. int
name The name of the role. string
osProfile The operating system profile. OsProfile
scriptActions The list of script actions on the role. ScriptAction[]
targetInstanceCount The instance count of the cluster. int
virtualNetworkProfile The virtual network profile. VirtualNetworkProfile
VMGroupName The name of the virtual machine group. string

ScriptAction

Name Description Value
name The name of the script action. string (required)
parameters The parameters for the script provided. string (required)
uri The URI to the script. string (required)

StorageAccount

Name Description Value
container The container in the storage account, only to be specified for WASB storage accounts. string
enableSecureChannel Enable secure channel or not, it's an optional field. Default value is false when cluster version < 5.1 and true when cluster version >= 5.1. bool
fileshare The file share name. string
fileSystem The filesystem, only to be specified for Azure Data Lake Storage Gen 2. string
isDefault Whether or not the storage account is the default storage account. bool
key The storage account access key. string
msiResourceId The managed identity (MSI) that is allowed to access the storage account, only to be specified for Azure Data Lake Storage Gen 2. string
name The name of the storage account. string
resourceId The resource ID of storage account, only to be specified for Azure Data Lake Storage Gen 2. string
saskey The shared access signature key. string

StorageProfile

Name Description Value
storage accounts The list of storage accounts in the cluster. StorageAccount[]

UserAssignedIdentity

Name Description Value
tenantID The tenant ID of user assigned identity. string

VirtualNetworkProfile

Name Description Value
ID The ID of the virtual network. string
subnet The name of the subnet. string

By managing Microsoft Entra ID-enabled Azure HDInsight clusters with ARM templates, you gain a consistent, repeatable, and automated way to deploy and configure secure big data environments. ARM templates allow you to define all cluster settings including identity, networking, storage, and security as code, ensuring that deployments are predictable and compliant with organizational standards.

This approach not only streamlines cluster provisioning and management but also integrates seamlessly with CI/CD pipelines, enabling you to scale and govern your HDInsight workloads with confidence.