Azure HDInsight 中的身份验证问题Authentication issues in Azure HDInsight

本文介绍在与 Azure HDInsight 群集交互时出现的问题的故障排除步骤和可能的解决方案。This article describes troubleshooting steps and possible resolutions for issues when interacting with Azure HDInsight clusters.

在由 Azure Data Lake提供支持的安全群集上,当域用户通过 HDI Gateway 登录到群集服务时(例如登录到 Apache Ambari 门户),HDI Gateway 首先会尝试从 Azure Active Directory (Azure AD) 获取 OAuth 令牌,然后从 Azure AD DS 获取 Kerberos 票证。On secure clusters backed by Azure Data Lake, when domain users sign in to the cluster services through HDI Gateway (like signing in to the Apache Ambari portal), HDI Gateway will try to obtain an OAuth token from Azure Active Directory (Azure AD) first, and then get a Kerberos ticket from Azure AD DS. 身份验证可能会在以下任一阶段失败。Authentication can fail in either of these stages. 本文的目标是对其中的一些问题进行调试。This article is aimed at debugging some of those issues.

如果身份验证失败,系统会提示你输入凭据。When the authentication fails, you will get prompted for credentials. 如果取消此对话框,则会输出错误消息。If you cancel this dialog, the error message will be printed. 下面是一些常见的错误消息:Here are some of the common error messages:

invalid_grant 或 unauthorized_client,50126invalid_grant or unauthorized_client, 50126

问题Issue

联合用户登录失败,错误代码为 50126(云用户的登录成功)。Sign in fails for federated users with error code 50126 (sign in succeeds for cloud users). 错误消息类似于:Error message is similar to:

Reason: Bad Request, Detailed Response: {"error":"invalid_grant","error_description":"AADSTS70002: Error validating credentials. AADSTS50126: Invalid username or password\r\nTrace ID: 09cc9b95-4354-46b7-91f1-efd92665ae00\r\n Correlation ID: 4209bedf-f195-4486-b486-95a15b70fbe4\r\nTimestamp: 2019-01-28 17:49:58Z","error_codes":[70002,50126], "timestamp":"2019-01-28 17:49:58Z","trace_id":"09cc9b95-4354-46b7-91f1-efd92665ae00","correlation_id":"4209bedf-f195-4486-b486-95a15b70fbe4"}

原因Cause

Azure AD 错误代码 50126 表示租户尚未设置 AllowCloudPasswordValidation 策略。Azure AD error code 50126 means the AllowCloudPasswordValidation policy has not been set by the tenant.

解决方法Resolution

Azure AD 租户的公司管理员应该允许 Azure AD 为 ADFS 支持的用户使用密码哈希。The Company Administrator of the Azure AD tenant should enable Azure AD to use password hashes for ADFS backed users. 应用 AllowCloudPasswordValidationPolicy,如在 HDInsight 中使用企业安全性套餐一文所示。Apply the AllowCloudPasswordValidationPolicy as shown in the article Use Enterprise Security Package in HDInsight.


invalid_grant 或 unauthorized_client,50034invalid_grant or unauthorized_client, 50034

问题Issue

登录失败,错误代码为 50034。Sign in fails with error code 50034. 错误消息类似于:Error message is similar to:

{"error":"invalid_grant","error_description":"AADSTS50034: The user account Microsoft.AzureAD.Telemetry.Diagnostics.PII does not exist in the 0c349e3f-1ac3-4610-8599-9db831cbaf62 directory. To sign into this application, the account must be added to the directory.\r\nTrace ID: bbb819b2-4c6f-4745-854d-0b72006d6800\r\nCorrelation ID: b009c737-ee52-43b2-83fd-706061a72b41\r\nTimestamp: 2019-04-29 15:52:16Z", "error_codes":[50034],"timestamp":"2019-04-29 15:52:16Z","trace_id":"bbb819b2-4c6f-4745-854d-0b72006d6800", "correlation_id":"b009c737-ee52-43b2-83fd-706061a72b41"}

原因Cause

用户名不正确(不存在)。User name is incorrect (does not exist). 用户所使用的用户名与 Azure 门户中使用的用户名不同。The user is not using the same username that is used in Azure portal.

解决方法Resolution

使用在该门户中使用的同一用户名。Use the same user name that works in that portal.


invalid_grant 或 unauthorized_client,50053invalid_grant or unauthorized_client, 50053

问题Issue

用户帐户被锁定,错误代码为 50053。User account is locked out, error code 50053. 错误消息类似于:Error message is similar to:

{"error":"unauthorized_client","error_description":"AADSTS50053: You've tried to sign in too many times with an incorrect user ID or password.\r\nTrace ID: 844ac5d8-8160-4dee-90ce-6d8c9443d400\r\nCorrelation ID: 23fe8867-0e8f-4e56-8764-0cdc7c61c325\r\nTimestamp: 2019-06-06 09:47:23Z","error_codes":[50053],"timestamp":"2019-06-06 09:47:23Z","trace_id":"844ac5d8-8160-4dee-90ce-6d8c9443d400","correlation_id":"23fe8867-0e8f-4e56-8764-0cdc7c61c325"}

原因Cause

使用错误密码进行了过多的登录尝试。Too many sign in attempts with an incorrect password.

解决方法Resolution

等待 30 分钟左右,停止任何可能尝试进行身份验证的应用程序。Wait for 30 minutes or so, stop any applications that might be trying to authenticate.


invalid_grant 或 unauthorized_client,50053 (#2)invalid_grant or unauthorized_client, 50053 (#2)

问题Issue

密码已过期,错误代码为 50053。Password expired, error code 50053. 错误消息类似于:Error message is similar to:

{"error":"user_password_expired","error_description":"AADSTS50055: Password is expired.\r\nTrace ID: 241a7a47-e59f-42d8-9263-fbb7c1d51e00\r\nCorrelation ID: c7fe4a42-67e4-4acd-9fb6-f4fb6db76d6a\r\nTimestamp: 2019-06-06 17:29:37Z","error_codes":[50055],"timestamp":"2019-06-06 17:29:37Z","trace_id":"241a7a47-e59f-42d8-9263-fbb7c1d51e00","correlation_id":"c7fe4a42-67e4-4acd-9fb6-f4fb6db76d6a","suberror":"user_password_expired","password_change_url":"https://portal.microsoftonline.com/ChangePassword.aspx"}

原因Cause

密码已过期。Password is expired.

解决方法Resolution

(在本地系统上)在 Azure 门户中更改密码,然后等待 30 分钟以便同步跟上进度。Change the password in the Azure portal (on your on-premises system) and then wait for 30 minutes for sync to catch up.


interaction_requiredinteraction_required

问题Issue

收到错误消息 interaction_requiredReceive error message interaction_required.

原因Cause

条件访问策略或 MFA 正在应用于用户。The conditional access policy or MFA is being applied to the user. 由于目前尚不支持交互式身份验证,因此需要从 MFA/条件访问中免除用户或群集。Since interactive authentication is not supported yet, the user or the cluster needs to be exempted from MFA / Conditional access. 如果选择免除群集(基于 IP 地址的免除策略),请确保为该 vnet 启用 AD ServiceEndpointsIf you choose to exempt the cluster (IP address based exemption policy), then make sure that the AD ServiceEndpoints are enabled for that vnet.

解决方法Resolution

使用条件访问策略并免除 HDInisght 群集的 MFA,如使用 Azure Active Directory 域服务配置具有企业安全性套餐的 HDInsight 群集中所示。Use conditional access policy and exempt the HDInisght clusters from MFA as shown in Configure a HDInsight cluster with Enterprise Security Package by using Azure Active Directory Domain Services.


登录被拒绝Sign in denied

问题Issue

登录被拒绝。Sign in is denied.

原因Cause

要进入此阶段,你的 OAuth 身份验证不是问题,但 Kerberos 身份验证是问题。To get to this stage, your OAuth authentication is not an issue, but Kerberos authentication is. 如果此群集由 ADLS 提供支持,则在尝试 Kerberos 身份验证之前,OAuth 登录已成功。If this cluster is backed by ADLS, OAuth sign in has succeeded before Kerberos auth is attempted. 在 WASB 群集上,不会尝试 OAuth 登录。On WASB clusters, OAuth sign in is not attempted. Kerberos 失败的原因可能有很多,例如密码哈希不同步、用户帐户在 Azure AD DS 中被锁定,等等。There could be many reasons for Kerberos failure - like password hashes are out of sync, user account locked out in Azure AD DS, and so on. 仅当用户更改密码时,才会同步密码哈希。Password hashes sync only when the user changes password. 创建 Azure AD DS 实例时,它将开始同步在创建后更改的密码。When you create the Azure AD DS instance, it will start syncing passwords that are changed after the creation. 它不会以追溯方式同步在它启动之前设置的密码。It won't retroactively sync passwords that were set before its inception.

解决方法Resolution

如果你认为密码可能不同步,请尝试更改密码并等待几分钟,以便完成同步。If you think passwords may not be in sync, try changing the password and wait for a few minutes to sync.

尝试以 SSH 方式登录。你需要尝试从已加入域的计算机使用相同的用户凭据进行身份验证 (kinit)。Try to SSH into a You will need to try to authenticate (kinit) using the same user credentials, from a machine that is joined to the domain. 使用本地用户通过 SSH 登录到头/边缘节点,然后运行 kinit。SSH into the head / edge node with a local user and then run kinit.


kinit 失败kinit fails

问题Issue

Kinit 失败。Kinit fails.

原因Cause

多种多样。Varies.

解决方法Resolution

要使 Kinit 成功,你需要知道 sAMAccountName(这是不带领域的短帐户名称)。For kinit to succeed, you need to know your sAMAccountName (this is the short account name without the realm). sAMAccountName 通常是帐户前缀(如 bob@contoso.com 中的 bob)。sAMAccountName is usually the account prefix (like bob in bob@contoso.com). 对于某些用户,它可能有所不同。For some users, it could be different. 你需要能够浏览/搜索目录来了解 sAMAccountNameYou will need the ability to browse / search the directory to learn your sAMAccountName.

查找 sAMAccountName 的方法:Ways to find sAMAccountName:

  • 如果可以使用本地 Ambari 管理员登录到 Ambari,请查看用户列表。If you can sign in to Ambari using the local Ambari admin, look at the list of users.

  • 如果你有已加入域的 Windows 计算机,则可以使用标准 Windows AD 工具进行浏览。If you have a domain joined windows machine, you can use the standard Windows AD tools to browse. 这需要一个在域中正常工作的帐户。This requires a working account in the domain.

  • 在头节点中,你可以使用 SAMBA 命令进行搜索。From the head node, you can use SAMBA commands to search. 这需要一个有效的 Kerberos 会话(成功的 kinit)。This requires a valid Kerberos session (successful kinit). net ads search "(userPrincipalName=bob*)"net ads search "(userPrincipalName=bob*)"

    搜索/浏览结果应显示 sAMAccountName 属性。The search / browse results should show you the sAMAccountName attribute. 此外,还可以查看 pwdLastSetbadPasswordTimeuserPrincipalName 等其他属性,看这些属性是否符合预期。Also, you could look at other attributes like pwdLastSet, badPasswordTime, userPrincipalName etc. to see if those properties match what you expect.


kinit 失败,预身份验证失败kinit fails with Preauthentication failure

问题Issue

Kinit 失败,Preauthentication 失败。Kinit fails with Preauthentication failure.

原因Cause

用户名或密码不正确。Incorrect username or password.

解决方法Resolution

检查你的用户名和密码。Check your username and password. 还要检查前面介绍的其他属性。Also check for other properties described above. 若要启用详细调试,请在尝试 kinit 之前从会话运行 export KRB5_TRACE=/tmp/krb.logTo enable verbose debugging, run export KRB5_TRACE=/tmp/krb.log from the session before trying kinit.


作业/HDFS 命令因 TokenNotFoundException 而失败Job / HDFS command fails due to TokenNotFoundException

问题Issue

由于 TokenNotFoundException,作业/HDFS 命令失败。Job / HDFS command fails due to TokenNotFoundException.

原因Cause

找不到所需的 OAuth 访问令牌,无法成功完成该作业/命令。The required OAuth access token was not found for the job / command to succeed. 在发出存储请求之前,ADLS / ABFS 驱动程序会尝试从凭据服务检索 OAuth 访问令牌。The ADLS / ABFS driver will try to retrieve the OAuth access token from the credential service before making storage requests. 当使用同一用户登录到 Ambari 门户时,将注册此令牌。This token gets registered when you sign in to the Ambari portal using the same user.

解决方法Resolution

确保之前已成功登录过 Ambari 门户并且使用的是其身份用来运行作业的用户名。Ensure that you have successfully logged in to the Ambari portal once through the username whose identity is used to run the job.


提取访问令牌时出错Error fetching access token

问题Issue

用户收到错误消息“Error fetching access token”。User receives error message Error fetching access token.

原因Cause

当用户尝试使用 ACL 访问 ADLS Gen2 并且 Kerberos 令牌已过期时,会间歇性地发生此错误。This error occurs intermittently when users try to access the ADLS Gen2 using ACLs and the Kerberos token has expired.

解决方法Resolution

  • 对于 Azure Data Lake Storage Gen2,请针对用户尝试以其身份登录的用户运行 /usr/lib/hdinsight-common/scripts/RegisterKerbTicketAndOAuth.sh <upn>For Azure Data Lake Storage Gen2, Run /usr/lib/hdinsight-common/scripts/RegisterKerbTicketAndOAuth.sh <upn> for the user the user is trying to login as

后续步骤Next steps

如果你的问题未在本文中列出,或者无法解决问题,请访问以下渠道之一获取更多支持:If you didn't see your problem or are unable to solve your issue, visit one of the following channels for more support:

  • 通过 Azure 社区支持获取 Azure 专家的解答。Get answers from Azure experts through Azure Community Support.

  • 如果需要更多帮助,可以从 Azure 门户提交支持请求。If you need more help, you can submit a support request from the Azure portal. 从菜单栏中选择“支持”,或打开“帮助 + 支持”中心。Select Support from the menu bar or open the Help + support hub.