Azure HDInsight ID 代理(预览版)Azure HDInsight ID Broker (preview)

本文介绍了如何在 Azure HDInsight 中设置和使用 HDInsight ID 代理 (HIB) 功能。This article describes how to set up and use the HDInsight ID Broker (HIB) feature in Azure HDInsight. 可以使用此功能获得对 Apache Ambari 的新式 OAuth 身份验证,同时执行多重身份验证 (MFA),而无需在 Azure Active Directory 域服务 (AAD-DS) 中使用旧密码哈希。You can use this feature to get modern OAuth authentication to Apache Ambari while having Multi-Factor Authentication(MFA) enforcement without needing legacy password hashes in Azure Active Directory Domain Services (AAD-DS).

概述Overview

在以下情况下,HIB 简化了复杂的身份验证设置:HIB simplifies complex authentication setups in the following scenarios:

  • 你的组织依赖于联合身份验证对访问云资源的用户进行身份验证。Your organization relies on federation to authenticate users for accessing cloud resources. 以前,若要使用 HDInsight 企业安全性套餐 (ESP) 群集,你必须启用从本地环境到 Azure Active Directory (Azure AD) 的密码哈希同步。Previously, to use HDInsight Enterprise Security Package (ESP) clusters, you had to enable password hash sync from your on-premises environment to Azure Active Directory (Azure AD). 对于某些组织而言,此要求可能比较困难或不理想。This requirement might be difficult or undesirable for some organizations.

  • 你的组织希望对基于 Web / HTTP 的 Apache Ambari 和其他群集资源的访问实施 MFA。Your organization would like to enforce MFA for web/HTTP based access to Apache Ambari and other cluster resources.

HIB 提供身份验证基础结构,支持协议从 OAuth(新式)转换到 Kerberos(旧式),而无需将密码哈希同步到 AAD-DS。HIB provides the authentication infrastructure that enables protocol transition from OAuth (modern) to Kerberos (legacy) without needing to sync password hashes to AAD-DS. 此基础结构由 Windows Server VM(ID 代理节点)上运行的组件以及群集网关节点组成。This infrastructure consists of components running on a Windows Server VM (ID Broker node), along with cluster gateway nodes.

下图显示了启用 ID 代理后针对所有用户(包括联合用户)的基于 OAuth 的新式身份验证流:The following diagram shows the modern OAuth based authentication flow for all users, including federated users, after ID Broker is enabled:

采用 ID 代理的身份验证流

在此图中,客户端(即浏览器或应用)需要先获取 OAuth 令牌,然后在 HTTP 请求中将令牌提供给网关。In this diagram, the client (i.e. browser or apps) need to acquire the OAuth token first and then present the token to gateway in an HTTP request. 如果已登录到其他 Azure 服务(例如 Azure 门户),可以使用单一登录 (SSO) 体验登录到 HDInsight 群集。If you've already signed in to other Azure services, such as the Azure portal, you can sign in to your HDInsight cluster with a single sign-on (SSO) experience.

仍有许多旧版应用程序仅支持基本身份验证(即用户名/密码)。There still may be many legacy applications that only support basic authentication (i.e. username/password). 对于这些情况,仍然可以使用 HTTP 基本身份验证连接到群集网关。For those scenarios, you can still use HTTP basic authentication to connect to the cluster gateways. 在此设置中,必须确保从网关节点到联合终结点(ADFS 终结点)的网络连接性,从而确保网关节点的直接视线。In this setup, you must ensure network connectivity from the gateway nodes to the federation endpoint (ADFS endpoint) to ensure a direct line of sight from gateway nodes.

使用下表根据你的组织需求确定最佳身份验证选项:Use the following table to determine the best authentication option based on your organization needs:

身份验证选项Authentication options HDInsight 配置HDInsight Configuration 需考虑的因素Factors to consider
完全 OAuthFully OAuth ESP + HIBESP + HIB 1.最安全选项(支持 MFA)2.1. Most Secure option (MFA is supported) 2. 不需要哈希传递同步。Pass hash sync is NOT required. 3.3. 不支持本地帐户的 ssh/kinit/keytab 访问,这些帐户在 AAD-DS 中没有密码哈希。No ssh/kinit/keytab access for on-prem accounts, which don't have password hash in AAD-DS. 4.4. 仅限云的帐户仍然可以 ssh/kinit/keytab。Cloud only accounts can still ssh/kinit/keytab. 5.5. 通过 Oauth 6 实现的对 Ambari 的基于 Web 的访问。Web-based access to Ambari through Oauth 6. 需要更新旧版应用(JDBC/ODBC 等)以支持 OAuth。Requires updating legacy apps (JDBC/ODBC, etc.) to support OAuth.
OAuth + 基本身份验证OAuth + Basic Auth ESP + HIBESP + HIB 1.通过 Oauth 2 实现的对 Ambari 的基于 Web 的访问。1. Web-based access to Ambari through Oauth 2. 旧版应用继续使用基本身份验证。3.Legacy apps continue to use basic auth. 3. 必须禁用 MFA 才能进行基本身份验证访问。MFA must be disabled for basic auth access. 4.4. 不需要哈希传递同步。Pass hash sync is NOT required. 5.5. 不支持本地帐户的 ssh/kinit/keytab 访问,这些帐户在 AAD-DS 中没有密码哈希。No ssh/kinit/keytab access for on-prem accounts, which don't have password hash in AAD-DS. 6.6. 仅限云的帐户仍然可以 ssh/kinit。Cloud only accounts can still ssh/kinit.
完全基本身份验证Fully Basic Auth ESPESP 1.最类似于本地设置。1. Most similar to on-prem setups. 2.2. 需要将密码哈希同步到 AAD-DS。Password hash sync to AAD-DS is required. 3.3. 本地帐户可以 ssh / kinit 或使用 keytab。On-prem accounts can ssh/kinit or use keytab. 4.4. 如果后备存储为 ADLS Gen2,则必须禁用 MFAMFA must be disabled if the backing storage is ADLS Gen2

启用 HDInsight ID 代理Enable HDInsight ID Broker

若要创建启用了 ID 代理的 ESP 群集,请执行以下步骤:To create an ESP cluster with ID Broker enabled, take the following steps:

  1. 登录到 Azure 门户Sign in to the Azure portal.
  2. 执行 ESP 群集的基本创建步骤。Follow the basic creation steps for an ESP cluster. 有关详细信息,请参阅创建使用 ESP 的 HDInsight 群集For more information, see Create an HDInsight cluster with ESP.
  3. 选择“启用 HDInsight ID 代理”。Select Enable HDInsight ID Broker.

ID 代理功能将向群集添加一个额外的 VM。The ID Broker feature will add one extra VM to the cluster. 此 VM 是 ID 代理节点,包括了用来支持身份验证的服务器组件。This VM is the ID Broker node and includes server components to support authentication. ID 代理节点以域加入方式加入到 Azure AD DS 域。The ID Broker node is domain joined to the Azure AD DS domain.

用于启用 ID 代理的选项

使用 Azure 资源管理器模板Using Azure Resource Manager templates

如果将名为 idbrokernode 且具有以下特性的一个新角色添加到模板的计算配置文件中,则会在启用 ID 代理节点的情况下创建集群:If you add a new role called idbrokernode with the following attributes to the compute profile of your template, then the cluster will get created with the ID broker node enabled:

.
.
.
"computeProfile": {
    "roles": [
        {
            "autoscale": null,
            "name": "headnode",
           ....
        },
        {
            "autoscale": null,
            "name": "workernode",
            ....
        },
        {
            "autoscale": null,
            "name": "idbrokernode",
            "targetInstanceCount": 1,
            "hardwareProfile": {
                "vmSize": "Standard_A2_V2"
            },
            "virtualNetworkProfile": {
                "id": "string",
                "subnet": "string"
            },
            "scriptActions": [],
            "dataDisksGroups": null
        }
    ]
}
.
.
.

工具集成Tool integration

HDIsngith 工具已更新为原生支持 OAuth。HDIsngith tools are updated to natively support OAuth. 我们强烈建议使用这些工具对群集进行基于新式 OAuth 的访问。We highly recommend using these tools for modern OAuth based access to the clusters. HDInsight IntelliJ 插件可以用于基于 Java 的应用程序,例如 Scala。The HDInsight IntelliJ plug-in can be used for JAVA based applications such as Scala. 适用于 VS Code 的 Spark&Hive 工具可用于 PySpark 和 Hive 作业。Spark & Hive Tools for VS Code can be used of PySpark and Hive jobs. 它们支持批处理和交互式作业。They supports both batch and interactive jobs.

在 Azure AD DS 中在没有密码哈希的情况下进行 SSH 访问SSH access without a password hash in Azure AD DS

SSH 选项SSH options 需考虑的因素Factors to consider
本地 VM 帐户(例如 sshuser)Local VM account (e.g. sshuser) 1.你创建群集时提供了此帐户。1. You provided this account at the cluster creation time. 2.2. 此帐户没有 kerberos 身份验证There is no kerberos authication for this account
仅限云的帐户(例如 alice@contoso.onmicrosoft.com)Cloud Only account (e.g. alice@contoso.onmicrosoft.com) 1.密码哈希在 AAD-DS 2 中可用。1. The password hash is available in AAD-DS 2. Kerberos 身份验证可以通过 SSH kerberos 进行Kerberos authentication is possible via SSH kerberos
本地帐户(例如 alice@contoso.com)On-prem account (e.g. alice@contoso.com) 1.仅当 AAD-DS 中提供密码哈希时,才可以进行 SSH Kerberos 身份验证,否则该用户无法通过 SSH 连接到群集1. SSH Kerberos authentication is only possible if password hash is available in AAD-DS otherwise this user cannot SSH to the cluster

若要通过 SSH 连接到已加入域的 VM,或者要运行 kinit 命令,你需要提供密码。To SSH to a domain-joined VM, or to run the kinit command, you need to provide a password. SSH Kerberos 身份验证要求 AAD-DS 中存在哈希。SSH Kerberos authentication requires the hash to be available in AAD-DS. 如果只想将 SSH 用于管理方案,则可创建一个纯云帐户,并使用该帐户通过 SSH 连接到群集。If you want to use SSH for administrative scenarios only, you can create one cloud-only account and use that to SSH to the cluster. 其他本地用户仍可使用 Ambari、HDInsight 工具或 HTTP 基本身份验证,而不需要在 AAD-DS 中有密码哈希。Other on-prem users can still use Ambari or HDInsight tools or HTTP basic auth without having the password hash available in AAD-DS.

如果你的组织未将密码哈希同步到 AAD-DS,则最佳做法是在 Azure AD 中创建一个仅限云的用户,并在创建群集时将其分配为群集管理员,并将其用于管理目的,包括通过 SSH 获得对 VM 的根访问权限。If your organization is not syncing password hashes to AAD-DS, as a best practice, create one cloud only user in Azure AD and assign it as cluster admin when creating the cluster and use that for administration purposes which includes getting root access to the VMs via SSH.

若要排查身份验证问题,请参阅此指南To troubleshoot authentication issues, please see this guide.

客户端使用 OAuth 连接到设置了 HIB 的 HDInsight 网关Clients using OAuth to connect to HDInsight gateway with HIB

在 HIB 设置中,可以更新连接到网关的自定义应用和客户端,以便首先获取所需的 OAuth 令牌。In the HIB setup, custom apps and clients connecting to the gateway can be updated to acquire the required OAuth token first. 你可以按照此文档中的步骤使用以下信息获取令牌:You can follow the steps in this document to acquire the token with the following information:

  • OAuth 资源 URI:https://hib.azurehdinsight.cnOAuth resource uri: https://hib.azurehdinsight.cn
  • AppId:7865c1d2-f040-46cc-875f-831a1ef6a28aAppId: 7865c1d2-f040-46cc-875f-831a1ef6a28a
  • 权限:(名称:Cluster.ReadWrite,id:8f89faa0-ffef-4007-974d-4989b39ad77d)Permission: (name: Cluster.ReadWrite, id: 8f89faa0-ffef-4007-974d-4989b39ad77d)

获取 OAuth 令牌后,可以在向群集网关(例如 https://-int.azurehdinsight.cn)发出的 HTTP 请求的授权标头中使用该令牌。After aquiring the OAuth token, you can use that in the authorization header of the HTTP request to the cluster gateway (e.g. https://-int.azurehdinsight.cn). 例如,Apache livy API 的示例 curl 命令可能如下所示:For example a sample curl command to Apache livy API might look like this:

curl -k -v -H "Authorization: Bearer Access_TOKEN" -H "Content-Type: application/json" -X POST -d '{ "file":"wasbs://mycontainer@mystorageaccount.blob.core.chinacloudapi.cn/data/SparkSimpleTest.jar", "className":"com.microsoft.spark.test.SimpleFile" }' "https://<clustername>-int.azurehdinsight.cn/livy/batches" -H "X-Requested-By:<username@domain.com>"

后续步骤Next steps