Azure HDInsight 中的 Ranger 和 Apache Ambari 中的 LDAP 同步LDAP sync in Ranger and Apache Ambari in Azure HDInsight

HDInsight 企业安全性套餐 (ESP) 群集使用 Ranger 进行授权。HDInsight Enterprise Security Package (ESP) clusters use Ranger for authorization. Apache Ambari 和 Ranger 都独立地同步用户和组,但其工作方式略有不同。Apache Ambari and Ranger both sync users and groups independently and work a little differently. 本文可帮助用户解决 Ranger 和 Ambari 中的 LDAP 同步问题。This article is meant to address the LDAP sync in Ranger and Ambari.

一般性指导General guidelines

  • 始终通过组部署群集。Always deploy clusters with groups.
  • 不要在 Ambari 和 Ranger 中更改组筛选器,请尽量在 Azure AD 中管理所有这些筛选器,并使用嵌套组来引入所需的用户。Instead of changing group filters in Ambari and Ranger, try to manage all these in Azure AD and use nested groups to bring in the required users.
  • 同步某个用户后,即使该用户不属于组,也不会将其删除。Once a user is synced, it isn't removed even if the user isn't part of the groups.
  • 如果你需要直接更改 LDAP 筛选器,请首先使用 UI,因为它包含一些验证。If you need to change the LDAP filters directly, use the UI first as it contains some validations.

用户是分别同步的Users are synced separately

Ambari 和 Ranger 不共享用户数据库,因为它们用于两种不同的用途。Ambari and Ranger don't share the user database because they serve two different purposes. 如果用户需要使用 Ambari UI,则需要将用户同步到 Ambari。If a user needs to use the Ambari UI, then the user needs to be synced to Ambari. 如果用户未同步到 Ambari,则 Ambari UI/API 会拒绝该用户,但系统的其他部分可以正常运行(这些部分受 Ranger 或资源管理器保护,而不是受 Ambari 保护)。If the user isn't synced to Ambari, Ambari UI / API will reject it but other parts of the system will work (these are guarded by Ranger or Resource Manager and not Ambari). 如果要将用户包括到某个 Ranger 策略中,请将用户同步到 Ranger。If you want to include the user into a Ranger policy, then sync the user to Ranger.

部署安全群集后,系统会以传递方式将组成员(包括所有子组及其成员)同步到 Ambari 和 Ranger。When a secure cluster is deployed, group members are synced transitively (all the subgroups and their members) to both Ambari and Ranger.

Ambari 的用户同步和配置Ambari user sync and configuration

在头节点中,cron 作业 /opt/startup_scripts/start_ambari_ldap_sync.py 每小时运行一次来调度用户同步。Cron 作业将调用 Ambari REST API 来执行同步。脚本将提交要同步的用户和组的列表(由于用户可能不属于指定的组,因此这两者是分别指定的)。From the head nodes, a cron job, /opt/startup_scripts/start_ambari_ldap_sync.py, is run every hour to schedule the user sync. The cron job calls the Ambari rest APIs to perform the sync. The script submits a list of users and groups to sync (as the users may not belong to the specified groups, both are specified individually). Ambari 以传递方式将 sAMAccountName 作为用户名和所有组成员进行同步。Ambari syncs the sAMAccountName as the username and all the group members, transitively.

日志应位于 /var/log/ambari-server/ambari-server.log 中。The logs should be in /var/log/ambari-server/ambari-server.log. 有关详细信息,请参阅配置 Ambari 日志记录级别For more information, see Configure Ambari logging level.

在 Data Lake 群集中,将使用创建用户后的挂钩为同步的用户创建主文件夹,并将这些用户设置为主文件夹的所有者。In Data Lake clusters, the post user creation hook is used to create the home folders for the synced users and they're set as the owners of the home folders. 如果用户未正确同步到 Ambari,则用户在访问暂存文件夹和其他临时文件夹时可能会遇到故障。If the user isn't synced to Ambari correctly, then the user could face failures in accessing staging and other temporary folders.

更新要同步到 Ambari 的组Update groups to be synced to Ambari

如果无法在 Azure AD 中管理组成员身份,则有两个选择:If you can't manage groups memberships in Azure AD, you have two choices:

Ranger 的用户同步和配置Ranger User sync and configuration

Ranger 有一个内置的同步引擎,每小时运行一次来同步用户。Ranger has an inbuilt sync engine that runs every hour to sync the users. 它不与 Ambari 共享用户数据库。It doesn't share the user database with Ambari. HDInsight 会对搜索筛选器进行配置,以便对管理员用户、监视用户以及在创建群集期间指定的组的成员进行同步。HDInsight configures the search filter to sync the admin user, the watchdog user, and the members of the group specified during the cluster creation. 组成员将以传递方式进行同步:The group members will be synced transitively:

  • 禁用增量同步。Disable incremental sync.
  • 启用用户组同步映射。Enable User group sync map.
  • 将搜索筛选器指定为包括可传递的组成员。Specify the search filter to include the transitive group members.
  • 同步用户的 sAMAccountName 和组的名称特性。Sync sAMAccountName for users and name attribute for groups.

组同步或增量同步Group or Incremental sync

Ranger 支持组同步选项,但它作为与用户筛选器的交集工作,Ranger supports a group sync option, but it works as an intersection with user filter. 而不是作为组成员身份与用户筛选器的并集工作。Not a union between group memberships and user filter. Ranger 中的组同步筛选器的典型用例如下:组筛选器:(dn=clusteradmingroup),用户筛选器:(city=seattle)。A typical use case for group sync filter in Ranger is - group filter: (dn=clusteradmingroup), user filter: (city=seattle).

增量同步仅适用于已首次同步的用户。Incremental sync works only for the users who are already synced (the first time). 增量同步不会同步在初始同步后添加到组的任何新用户。Incremental won't sync any new users added to the groups after the initial sync.

更新 Ranger 同步筛选器Update Ranger sync filter

可以在 Ambari UI 中的“Ranger 用户-同步配置”部分找到 LDAP 筛选器。The LDAP filter can be found in the Ambari UI, under the Ranger user-sync configuration section. 现有筛选器的格式将为 (|(userPrincipalName=bob@contoso.com)(userPrincipalName=hdiwatchdog-core01@CONTOSO.ONMICROSOFT.COM)(memberOf:1.2.840.113556.1.4.1941:=CN=hadoopgroup,OU=AADDC Users,DC=contoso,DC=onmicrosoft,DC=com))The existing filter will be in the form (|(userPrincipalName=bob@contoso.com)(userPrincipalName=hdiwatchdog-core01@CONTOSO.ONMICROSOFT.COM)(memberOf:1.2.840.113556.1.4.1941:=CN=hadoopgroup,OU=AADDC Users,DC=contoso,DC=onmicrosoft,DC=com)). 请确保在末尾添加谓词,并使用 net ads 搜索命令、ldp.exe 或类似的工具来测试筛选器。Ensure that you add predicate at the end and test the filter by using net ads search command or ldp.exe or something similar.

Ranger 用户同步日志Ranger user sync logs

Ranger 用户同步可以在任一头节点中进行。Ranger user sync can happen out of either of the headnodes. 日志位于 /var/log/ranger/usersync/usersync.log 中。The logs are in /var/log/ranger/usersync/usersync.log. 若要增大日志的详细级别,请执行以下步骤:To increase the verbosity of the logs, do the following steps:

  1. 登录到 Ambari。Log in to Ambari.
  2. 转到 Ranger 配置部分。Go to the Ranger configuration section.
  3. 转到“高级 usersync-log4j”部分。Go to the Advanced usersync-log4j section.
  4. log4j.rootLogger 更改为 DEBUG 级别(更改后应类似于 log4j.rootLogger = DEBUG,logFile,FilterLog)。Change the log4j.rootLogger to DEBUG level (After change it should look like log4j.rootLogger = DEBUG,logFile,FilterLog).
  5. 保存配置并重启 Ranger。Save the configuration and restart ranger.

后续步骤Next steps