在 HDInsight 中使用企业安全性套餐Use Enterprise Security Package in HDInsight

标准 Azure HDInsight 群集是单用户群集。The standard Azure HDInsight cluster is a single-user cluster. 适用于大多数使用小型应用程序团队来构建大数据工作负荷的公司。It's suitable for most companies that have smaller application teams building large data workloads. 每个用户可以按需创建专用群集,并在不再需要时将其销毁。Each user can create a dedicated cluster on demand and destroy it when it's not needed anymore.

许多企业已转向这样一种模型:群集由 IT 团队管理,并由多个应用程序团队共享。Many enterprises have moved toward a model in which IT teams manage clusters, and multiple application teams share clusters. 这些较大型企业需要在 Azure HDInsight 中实现对每个群集的多用户访问。These larger enterprises need multiuser access to each cluster in Azure HDInsight.

HDInsight 以托管方式依赖于常用的标识提供者 - Active Directory。HDInsight relies on a popular identity provider--Active Directory--in a managed way. 通过将 HDInsight 与 Azure Active Directory 域服务 (Azure AD DS) 相集成,可以使用你的域凭据来访问群集。By integrating HDInsight with Azure Active Directory Domain Services (Azure AD DS), you can access the clusters by using your domain credentials.

HDInsight 中的虚拟机 (VM) 将加入你提供的域。The virtual machines (VMs) in HDInsight are domain joined to your provided domain. 因此,在 HDInsight 上运行的所有服务(Apache Ambari、Apache Hive 服务器、Apache Ranger、Apache Spark Thrift 服务器等)都可以为经身份验证的用户无缝运行。So, all the services running on HDInsight (Apache Ambari, Apache Hive server, Apache Ranger, Apache Spark thrift server, and others) work seamlessly for the authenticated user. 然后,管理员可以使用 Apache Ranger 创建强大的授权策略,以针对群集中的资源提供基于角色的访问控制。Administrators can then create strong authorization policies by using Apache Ranger to provide role-based access control for resources in the cluster.

将 HDInsight 与 Active Directory 集成Integrate HDInsight with Active Directory

开源 Apache Hadoop 依赖于 Kerberos 协议来提供身份验证和安全性。Open-source Apache Hadoop relies on the Kerberos protocol for authentication and security. 因此,使用企业安全性套餐 (ESP) 的 HDInsight 群集节点加入到由 Azure AD DS 管理的域。Therefore, HDInsight cluster nodes with Enterprise Security Package (ESP) are joined to a domain that's managed by Azure AD DS. 将为群集上的 Hadoop 组件配置 Kerberos 安全性。Kerberos security is configured for the Hadoop components on the cluster.

自动创建以下内容:The following things are created automatically:

  • 每个 Hadoop 组件的服务主体A service principal for each Hadoop component
  • 加入域的每台计算机的计算机主体A machine principal for each machine that's joined to the domain
  • 每个群集的组织单位 (OU),用于存储这些服务和计算机主体An Organizational Unit (OU) for each cluster to store these service and machine principals

概而言之,需要在环境中设置以下项:To summarize, you need to set up an environment with:

  • 一个 Active Directory 域(由 Azure AD DS 管理)。An Active Directory domain (managed by Azure AD DS). 域名不得超过 39 个字符,否则不能与 Azure HDInsight 配合使用。The domain name must be 39 characters or less to work with Azure HDInsight.
  • 在 Azure AD DS 中启用的安全 LDAP (LDAPS)。Secure LDAP (LDAPS) enabled in Azure AD DS.
  • HDInsight 虚拟网络和 Azure AD DS 虚拟网络之间的正常网络连接(如果为这两者选择不同的虚拟网络)。Proper networking connectivity from the HDInsight virtual network to the Azure AD DS virtual network, if you choose separate virtual networks for them. HDInsight 虚拟网络中的 VM 应通过虚拟网络对等互连与 Azure AD DS 连接。A VM inside the HDInsight virtual network should have a line of sight to Azure AD DS through virtual network peering. 如果 HDInsight 和 Azure AD DS 部署在同一虚拟网络中,则会自动提供此连接,不需要执行进一步操作。If HDInsight and Azure AD DS are deployed in the same virtual network, the connectivity is automatically provided, and no further action is needed.

设置不同的域控制器Set up different domain controllers

HDInsight 当前仅支持将 Azure AD DS 用作群集用于与 Kerberos 进行通信的主域控制器。HDInsight currently supports only Azure AD DS as the main domain controller that the cluster uses for Kerberos communication. 但是,也可以使用其他复杂的 Active Directory 设置,只要该设置能启用 Azure AD DS 进行 HDInsight 访问。But other complex Active Directory setups are possible, as long as such a setup leads to enabling Azure AD DS for HDInsight access.

Azure Active Directory 域服务Azure Active Directory Domain Services

Azure AD DS 提供与 Windows Server Active Directory 完全兼容的托管域。Azure AD DS provides a managed domain that's fully compatible with Windows Server Active Directory. Microsoft 负责采用高度可用的 (HA) 设置来管理、修补和监视域。Microsoft takes care of managing, patching, and monitoring the domain in a highly available (HA) setup. 你可以部署群集,而不用担心如何维护域控制器。You can deploy your cluster without worrying about maintaining domain controllers.

用户、组和密码将从 Azure AD 进行同步。Users, groups, and passwords are synchronized from Azure AD. 利用从 Azure AD 实例到 Azure AD DS 的单向同步,用户可以使用相同的企业凭据登录到群集。The one-way sync from your Azure AD instance to Azure AD DS enables users to sign in to the cluster by using the same corporate credentials.

有关详细信息,请参阅使用 Azure AD DS 配置使用 ESP 的 HDInsight 群集For more information, see Configure HDInsight clusters with ESP using Azure AD DS.

IaaS VM 上的本地 Active Directory 或 Active DirectoryOn-premises Active Directory or Active Directory on IaaS VMs

如果域具有本地 Active Directory 示例或更复杂的 Active Directory 设置,则可以使用 Azure AD Connect 将这些标识同步到 Azure AD。If you have an on-premises Active Directory instance or more complex Active Directory setups for your domain, you can sync those identities to Azure AD by using Azure AD Connect. 然后可在该 Active Directory 租户上启用 Azure AD DS。You can then enable Azure AD DS on that Active Directory tenant.

由于 Kerberos 依赖于密码哈希,因此必须在 Azure AD DS 上启用密码哈希同步Because Kerberos relies on password hashes, you must enable password hash sync on Azure AD DS.

如果使用 Active Directory 联合身份验证服务 (AD FS) 进行联合身份验证,则必须启用密码哈希同步。密码哈希同步在 AD FS 基础结构失败时可以帮助进行灾难恢复,并且它还有助于提供泄漏凭据保护。If you're using federation with Active Directory Federation Services (AD FS), you must enable password hash sync. Password hash sync helps with disaster recovery in case your AD FS infrastructure fails, and it also helps provide leaked-credential protection. 有关详细信息,请参阅使用 Azure AD Connect 同步启用密码哈希同步For more information, see Enable password hash sync with Azure AD Connect sync.

在 IaaS VM 上单独使用本地 Active Directory 或 Active Directory 而不使用 Azure AD 和 Azure AD DS,这是使用 ESP 的 HDInsight 群集不支持的配置。Using on-premises Active Directory or Active Directory on IaaS VMs alone, without Azure AD and Azure AD DS, isn't a supported configuration for HDInsight clusters with ESP.

如果使用了联合身份验证并且密码哈希已正确同步,但是身份验证失败,请检查是否为 PowerShell 服务主体启用了云密码身份验证。If federation is being used and password hashes are synced correctly, but you're getting authentication failures, check if cloud password authentication is enabled for the PowerShell service principal. 若要检查和设置 HRD 策略,请执行以下操作:To check and set the HRD policy:

  1. 安装预览版 Azure AD PowerShell 模块Install the preview Azure AD PowerShell module.

    Install-Module AzureAD
  2. 使用全局管理员(租户管理员)凭据进行连接。Connect using global administrator (tenant administrator) credentials.

  3. 检查是否已创建了 Microsoft Azure PowerShell 服务主体。Check if the Microsoft Azure PowerShell service principal has already been created.

    Get-AzureADServicePrincipal -SearchString "Microsoft Azure Powershell"
  4. 如果它不存在,则创建此服务主体。If it doesn't exist, then create the service principal.

    $powershellSPN = New-AzureADServicePrincipal -AppId 1950a258-227b-4e31-a9cf-717495945fc2
  5. 创建策略并将其附加到此服务主体。Create and attach the policy to this service principal.

     # Determine whether policy exists
     Get-AzureADPolicy | Where {$_.DisplayName -eq "EnableDirectAuth"}
     # Create if not exists
     $policy = New-AzureADPolicy `
         -Definition @('{"HomeRealmDiscoveryPolicy":{"AllowCloudPasswordValidation":true}}') `
         -DisplayName "EnableDirectAuth" `
         -Type "HomeRealmDiscoveryPolicy"
     # Determine whether a policy for the service principal exist
     Get-AzureADServicePrincipalPolicy `
         -Id $powershellSPN.ObjectId
     # Add a service principal policy if not exist
     Add-AzureADServicePrincipalPolicy `
         -Id $powershellSPN.ObjectId `
         -refObjectID $policy.ID

后续步骤Next steps