Azure HDInsight ID 代理 (HIB)Azure HDInsight ID Broker (HIB)

本文介绍了如何设置和使用 Azure HDInsight ID 代理功能。This article describes how to set up and use the Azure HDInsight ID Broker feature. 可以使用此功能获得对 Apache Ambari 的新式 OAuth 身份验证,同时执行多重身份验证,而无需在 Azure Active Directory 域服务 (Azure AD DS) 中使用旧密码哈希。You can use this feature to get modern OAuth authentication to Apache Ambari while having multifactor authentication enforcement without needing legacy password hashes in Azure Active Directory Domain Services (Azure AD DS).

概述Overview

在以下情况下,HDInsight ID 代理简化了复杂的身份验证设置:HDInsight ID Broker simplifies complex authentication setups in the following scenarios:

  • 你的组织依赖于联合身份验证对访问云资源的用户进行身份验证。Your organization relies on federation to authenticate users for accessing cloud resources. 以前,若要使用 HDInsight 企业安全性套餐群集,必须启用从本地环境到 Azure Active Directory (Azure AD) 的密码哈希同步。Previously, to use HDInsight Enterprise Security Package clusters, you had to enable password hash sync from your on-premises environment to Azure Active Directory (Azure AD). 对于某些组织而言,此要求可能比较困难或不理想。This requirement might be difficult or undesirable for some organizations.
  • 你的组织希望对基于 Web 或 HTTP 的 Apache Ambari 和其他群集资源的访问强制执行多重身份验证。Your organization wants to enforce multifactor authentication for web-based or HTTP-based access to Apache Ambari and other cluster resources.

HDInsight ID 代理提供身份验证基础结构,支持协议从 OAuth(新式)转换到 Kerberos(旧式),而无需将密码哈希同步到 Azure AD DS。HDInsight ID Broker provides the authentication infrastructure that enables protocol transition from OAuth (modern) to Kerberos (legacy) without needing to sync password hashes to Azure AD DS. 此基础结构包括在启用了 HDInsight ID 代理节点的 Windows Server 虚拟机 (VM) 上运行的组件以及群集网关节点。This infrastructure consists of components running on a Windows Server virtual machine (VM) with the HDInsight ID Broker node enabled, along with cluster gateway nodes.

使用下表根据你的组织需求确定最佳身份验证选项。Use the following table to determine the best authentication option based on your organization's needs.

身份验证选项Authentication options HDInsight 配置HDInsight Configuration 需考虑的因素Factors to consider
完全 OAuthFully OAuth 企业安全性套餐 + HDInsight ID 代理Enterprise Security Package + HDInsight ID Broker 最安全的选项。Most secure option. (支持多重身份验证。)不需要哈希传递同步。(Multifactor authentication is supported.) Pass hash sync is not required. 不支持本地帐户的 ssh/kinit/keytab 访问,这些帐户在 Azure AD DS 中没有密码哈希。No ssh/kinit/keytab access for on-premises accounts, which don't have password hash in Azure AD DS. 仅限云的帐户仍然可以使用 ssh/kinit/keytab 访问。Cloud-only accounts can still ssh/kinit/keytab. 通过 OAuth 实现的对 Ambari 的基于 Web 的访问。Web-based access to Ambari through OAuth. 需要更新旧版应用(例如 JDBC/ODBC)以支持 OAuth。Requires updating legacy apps (for example, JDBC/ODBC) to support OAuth.
OAuth + 基本身份验证OAuth + Basic Auth 企业安全性套餐 + HDInsight ID 代理Enterprise Security Package + HDInsight ID Broker 通过 OAuth 实现的对 Ambari 的基于 Web 的访问。Web-based access to Ambari through OAuth. 旧版应用继续使用基本身份验证。对于基本身份验证访问,必须禁用多重身份验证。Legacy apps continue to use basic auth. Multifactor authentication must be disabled for basic auth access. 不需要哈希传递同步。Pass hash sync is not required. 不支持本地帐户的 ssh/kinit/keytab 访问,这些帐户在 Azure AD DS 中没有密码哈希。No ssh/kinit/keytab access for on-premises accounts, which don't have password hash in Azure AD DS. 仅限云的帐户仍然可以使用 ssh/kinit 访问。Cloud-only accounts can still ssh/kinit.
完全基本身份验证Fully Basic Auth 企业安全性套餐Enterprise Security Package 最类似于本地设置。Most similar to on-premises setups. 需要将密码哈希同步到 Azure AD DS。Password hash sync to Azure AD DS is required. 本地帐户可以使用 ssh/kinit 或使用 keytab。On-premises accounts can ssh/kinit or use keytab. 如果后备存储是 Azure Data Lake Storage Gen2,则必须禁用多重身份验证。Multifactor authentication must be disabled if the backing storage is Azure Data Lake Storage Gen2.

下图显示了启用 HDInsight ID 代理后针对所有用户(包括联合用户)的基于 OAuth 的新式身份验证流:The following diagram shows the modern OAuth-based authentication flow for all users, including federated users, after HDInsight ID Broker is enabled:

显示使用 HDInsight ID 代理的身份验证流的示意图。

在此图中,客户端(即浏览器或应用)需要首先获取 OAuth 令牌。In this diagram, the client (that is, a browser or app) needs to acquire the OAuth token first. 然后,将该令牌提供给 HTTP 请求中的网关。Then it presents the token to the gateway in an HTTP request. 如果已登录到其他 Azure 服务(例如 Azure 门户),可以使用单一登录体验登录到 HDInsight 群集。If you've already signed in to other Azure services, such as the Azure portal, you can sign in to your HDInsight cluster with a single sign-on experience.

仍有许多旧版应用程序仅支持基本身份验证(即用户名和密码)。There still might be many legacy applications that only support basic authentication (that is, username and password). 对于这些情况,仍然可以使用 HTTP 基本身份验证连接到群集网关。For those scenarios, you can still use HTTP basic authentication to connect to the cluster gateways. 在此设置中,必须确保从网关节点到 Active Directory 联合身份验证服务 (AD FS) 终结点的网络连接性,从而确保网关节点的直接视线。In this setup, you must ensure network connectivity from the gateway nodes to the Active Directory Federation Services (AD FS) endpoint to ensure a direct line of sight from gateway nodes.

下图显示了联合用户的基本身份验证流。The following diagram shows the basic authentication flow for federated users. 首先,网关尝试使用 ROPC 流完成身份验证。First, the gateway attempts to complete the authentication by using ROPC flow. 如果没有密码哈希同步到 Azure AD,则会回退到发现 AD FS 终结点并通过访问 AD FS 终结点完成身份验证。In case there are no password hashes synced to Azure AD, it falls back to discovering the AD FS endpoint and completes the authentication by accessing the AD FS endpoint.

显示基本身份验证体系结构的示意图。

启用 HDInsight ID 代理Enable HDInsight ID Broker

创建启用了 HDInsight ID 代理的企业安全性套餐群集:To create an Enterprise Security Package cluster with HDInsight ID Broker enabled:

  1. 登录 Azure 门户Sign in to the Azure portal.
  2. 按照企业安全性套餐群集的基本创建步骤进行操作。Follow the basic creation steps for an Enterprise Security Package cluster. 有关详细信息,请参阅创建使用企业安全性套餐的 HDInsight 群集For more information, see Create an HDInsight cluster with Enterprise Security Package.
  3. 选择“启用 HDInsight ID 代理”。Select Enable HDInsight ID Broker.

HDInsight ID 代理功能将向群集添加一个额外的 VM。The HDInsight ID Broker feature adds one extra VM to the cluster. 此 VM 是 HDInsight ID 代理节点,包括用来支持身份验证的服务器组件。This VM is the HDInsight ID Broker node, and it includes server components to support authentication. HDInsight ID 代理节点以域加入方式加入到 Azure AD DS 域。The HDInsight ID Broker node is domain joined to the Azure AD DS domain.

显示用于启用 HDInsight ID 代理的选项的示意图。

使用 Azure 资源管理器模板Use Azure Resource Manager templates

如果将名为 idbrokernode 且具有以下特性的新角色添加到模板的计算配置文件中,则会在启用了 HDInsight ID 代理节点的情况下创建群集:If you add a new role called idbrokernode with the following attributes to the compute profile of your template, the cluster will be created with the HDInsight ID Broker node enabled:

.
.
.
"computeProfile": {
    "roles": [
        {
            "autoscale": null,
            "name": "headnode",
           ....
        },
        {
            "autoscale": null,
            "name": "workernode",
            ....
        },
        {
            "autoscale": null,
            "name": "idbrokernode",
            "targetInstanceCount": 2,
            "hardwareProfile": {
                "vmSize": "Standard_A2_V2"
            },
            "virtualNetworkProfile": {
                "id": "string",
                "subnet": "string"
            },
            "scriptActions": [],
            "dataDisksGroups": null
        }
    ]
}
.
.
.

若要查看 ARM 模板的完整示例,请参阅此处发布的模板。To see a complete sample of an ARM template, please see the template published here.

工具集成Tool integration

更新 HDInsight 工具以本机支持 OAuth。HDInsight tools are updated to natively support OAuth. 使用这些工具对群集进行基于 OAuth 的新式访问。Use these tools for modern OAuth-based access to the clusters. HDInsight IntelliJ 插件可以用于基于 Java 的应用程序,例如 Scala。The HDInsight IntelliJ plug-in can be used for Java-based applications, such as Scala. 适用于 Visual Studio Code 的 Spark 和 Hive 工具可用于 PySpark 和 Hive 作业。Spark and Hive Tools for Visual Studio Code can be used for PySpark and Hive jobs. 这些工具同时支持批处理和交互式作业。The tools support both batch and interactive jobs.

在 Azure AD DS 中在没有密码哈希的情况下进行 SSH 访问SSH access without a password hash in Azure AD DS

SSH 选项SSH options 需考虑的因素Factors to consider
本地 VM 帐户(例如 sshuser)Local VM account (for example, sshuser) 你创建群集时提供了此帐户。You provided this account at the cluster creation time. 此帐户没有 Kerberos 身份验证。There's no Kerberos authentication for this account.
仅限云的帐户(例如 alice@contoso.onmicrosoft.com)Cloud-only account (for example, alice@contoso.onmicrosoft.com) 密码哈希在 Azure AD DS 中可用。The password hash is available in Azure AD DS. Kerberos 身份验证可以通过 SSH Kerberos 进行。Kerberos authentication is possible via SSH Kerberos.
本地帐户(例如 alice@contoso.com)On-premises account (for example, alice@contoso.com) 只有在 Azure AD DS 中提供了密码哈希,才能进行 SSH Kerberos 身份验证。SSH Kerberos authentication is only possible if a password hash is available in Azure AD DS. 否则,该用户无法通过 SSH 连接到群集。Otherwise, this user can't SSH to the cluster.

若要通过 SSH 连接到已加入域的 VM,或者要运行 kinit 命令,必须提供密码。To SSH to a domain-joined VM or to run the kinit command, you must provide a password. SSH Kerberos 身份验证要求 Azure AD DS 中存在哈希。SSH Kerberos authentication requires the hash to be available in Azure AD DS. 如果只想将 SSH 用于管理方案,则可创建一个仅限云的帐户,并使用该帐户通过 SSH 连接到群集。If you want to use SSH for administrative scenarios only, you can create one cloud-only account and use it to SSH to the cluster. 其他本地用户仍可使用 Ambari、HDInsight 工具或 HTTP 基本身份验证,而不需要在 Azure AD DS 中有密码哈希。Other on-premises users can still use Ambari or HDInsight tools or HTTP basic auth without having the password hash available in Azure AD DS.

如果你的组织未将密码哈希同步到 Azure AD DS,则最佳做法是在 Azure AD 中创建一个仅限云的用户。If your organization isn't syncing password hashes to Azure AD DS, as a best practice, create one cloud-only user in Azure AD. 然后,在创建群集时将其分配为群集管理员,并将其用于管理目的。Then assign it as a cluster admin when you create the cluster, and use that for administration purposes. 可以使用该用户通过 SSH 获取对 VM 的根访问权限。You can use it to get root access to the VMs via SSH.

若要排查身份验证问题,请参阅此指南To troubleshoot authentication issues, please see this guide.

客户端使用 OAuth 连接到使用 HDInsight ID 代理的 HDInsight 网关Clients using OAuth to connect to an HDInsight gateway with HDInsight ID Broker

在 HDInsight ID 代理设置中,可以更新连接到网关的自定义应用和客户端,以便首先获取所需的 OAuth 令牌。In the HDInsight ID Broker setup, custom apps and clients that connect to the gateway can be updated to acquire the required OAuth token first. 请按照此文档中的步骤使用以下信息获取令牌:Follow the steps in this document to acquire the token with the following information:

  • OAuth 资源 URI:https://hib.azurehdinsight.cnOAuth resource uri: https://hib.azurehdinsight.cn
  • AppId:7865c1d2-f040-46cc-875f-831a1ef6a28aAppId: 7865c1d2-f040-46cc-875f-831a1ef6a28a
  • 权限:(名称:Cluster.ReadWrite,id:8f89faa0-ffef-4007-974d-4989b39ad77d)Permission: (name: Cluster.ReadWrite, id: 8f89faa0-ffef-4007-974d-4989b39ad77d)

获取 OAuth 令牌后,在向群集网关(例如 https://-int.azurehdinsight.cn)发出的 HTTP 请求的授权标头中使用该令牌。After you acquire the OAuth token, use it in the authorization header of the HTTP request to the cluster gateway (for example, https://-int.azurehdinsight.cn). Apache livy API 的示例 curl 命令可能如下例所示:A sample curl command to Apache Livy API might look like this example:

curl -k -v -H "Authorization: Bearer Access_TOKEN" -H "Content-Type: application/json" -X POST -d '{ "file":"wasbs://mycontainer@mystorageaccount.blob.core.chinacloudapi.cn/data/SparkSimpleTest.jar", "className":"com.microsoft.spark.test.SimpleFile" }' "https://<clustername>-int.azurehdinsight.cn/livy/batches" -H "X-Requested-By:<username@domain.com>"

若要使用 Beeline 和 Livy,还可以按照此处提供的示例代码来设置客户端,以使用 OAuth 并连接到群集。For using Beeline and Livy, you can also follow the samples codes provided here to setup your client to use OAuth and connect to the cluster.

常见问题解答FAQ

AAD 中的 HDInsight 创建了什么应用?What app is created by HDInsight in AAD?

对于每个群集,会在 AAD 中注册第三方应用程序,并将群集 URI 作为 identifierUri(如 https://clustername.azurehdinsight.cn)。For each cluster, a third party application will be registered in AAD with the cluster uri as the identifierUri (like https://clustername.azurehdinsight.cn).

在 AAD 中,所有第三方应用程序都需要先获得用户许可才能对用户进行身份验证或访问数据。In AAD, consent is required for all third party applications before it can authenticate users or access data.

Microsoft Graph API 允许自动提供许可,请参阅 API 文档了解相关信息。自动提供许可的顺序为:Microsoft Graph api allows you to automate the consent, see the API documentation The sequence to automate the consent is:

  • 注册应用并向应用授予 Application.ReadWrite.All 权限,以访问 Microsoft GraphRegister an app and grant Application.ReadWrite.All permissions to the app, to access Microsoft Graph
  • 创建群集后,基于标识符 URI 查询群集应用After a cluster is created, query for the cluster app based on the identifier uri
  • 注册应用的许可Register consent for the app

删除群集后,HDInsight 会删除该应用,无需清除任何许可。When the cluster is deleted, HDInsight delete the app and there is no need to cleanup any consent.

后续步骤Next steps