Azure HDInsight 中的 Ranger 和 Apache Ambari 中的 LDAP 同步LDAP sync in Ranger and Apache Ambari in Azure HDInsight

HDInsight 企业安全性套餐 (ESP) 群集使用 Ranger 进行授权。HDInsight Enterprise Security Package (ESP) clusters use Ranger for authorization. Apache Ambari 和 Ranger 都独立地同步用户和组,但其工作方式略有不同。Apache Ambari and Ranger both sync users and groups independently and work a little differently. 本文可帮助用户解决 Ranger 和 Ambari 中的 LDAP 同步问题。This article is meant to address the LDAP sync in Ranger and Ambari.

一般性指导General guidelines

  • 始终使用一个或多个组部署群集。Always deploy clusters with one or more groups.
  • 如果要在群集中使用更多组,请检查是否有必要在 Azure Active Directory (Azure AD) 中更新组成员身份。If you want to use more groups in the cluster, check whether it makes sense to update the group memberships in Azure Active Directory (Azure AD).
  • 如果要更改群集组,可以通过使用 Ambari 更改同步筛选器。If you want to change the cluster groups, you can change the sync filters by using Ambari.
  • Azure AD 中的所有组成员身份更改都会在后续同步中反映到群集内。All group membership changes in Azure AD are reflected in the cluster in subsequent syncs. 更改需要先同步到 Azure AD 域服务 (Azure AD DS),然后同步到群集。The changes need to be synced to Azure AD Domain Services (Azure AD DS) first, and then to the clusters.
  • HDInsight 群集使用 Samba/Winbind 将组成员身份投影到群集节点上。HDInsight clusters use Samba/Winbind to project the group memberships on the cluster nodes.
  • 系统会以可传递的方式将组成员(包括所有子组及其成员)同步到 Ambari 和 Ranger。Group members are synced transitively (all the subgroups and their members) to both Ambari and Ranger.

用户是分别同步的Users are synced separately

  • Ambari 和 Ranger 不共享用户数据库,因为它们用于两种不同的用途。Ambari and Ranger don't share the user database because they serve two different purposes.
    • 如果用户需要使用 Ambari UI,则需要将用户同步到 Ambari。If a user needs to use the Ambari UI, the user needs to be synced to Ambari.
    • 如果用户未同步到 Ambari,则 Ambari UI/API 会拒绝该用户,但系统的其他部分将正常运行(这些部分受 Ranger 或资源管理器保护,而不是受 Ambari 保护)。If the user isn't synced to Ambari, the Ambari UI/API will reject it, but other parts of the system will work (these are guarded by Ranger or Resource Manager, and not by Ambari).
    • 若要在 Ranger 策略中包括用户或组,主体需要显式同步到 Ranger 中。To include users or groups in Ranger policies, the principals need to be explicitly synced in Ranger.

Ambari 的用户同步和配置Ambari user sync and configuration

在头节点中,cron 作业 /opt/startup_scripts/start_ambari_ldap_sync.py 每小时运行一次来调度用户同步。Cron 作业将调用 Ambari REST API 来执行同步。脚本将提交要同步的用户和组的列表(由于用户可能不属于指定的组,因此这两者是分别指定的)。From the head nodes, a cron job, /opt/startup_scripts/start_ambari_ldap_sync.py, is run every hour to schedule the user sync. The cron job calls the Ambari rest APIs to perform the sync. The script submits a list of users and groups to sync (as the users may not belong to the specified groups, both are specified individually). Ambari 以传递方式将 sAMAccountName 作为用户名和所有组成员进行同步。Ambari syncs the sAMAccountName as the username and all the group members, transitively.

日志应位于 /var/log/ambari-server/ambari-server.log 中。The logs should be in /var/log/ambari-server/ambari-server.log. 有关详细信息,请参阅配置 Ambari 日志记录级别For more information, see Configure Ambari logging level.

在 Data Lake 群集中,将使用创建用户后的挂钩为同步的用户创建主文件夹,并将这些用户设置为主文件夹的所有者。In Data Lake clusters, the post user creation hook is used to create the home folders for the synced users and they're set as the owners of the home folders. 如果用户未正确同步到 Ambari,则用户在运行作业时可能会遇到失败,因为主文件夹可能未正确设置。If the user isn't synced to Ambari correctly, then the user could face failures in running jobs as the home folder may not be setup correctly.

Ranger 的用户同步和配置Ranger user sync and configuration

Ranger 有一个内置的同步引擎,该引擎每小时运行一次来同步用户。Ranger has a built-in sync engine that runs every hour to sync users. 它不与 Ambari 共享用户数据库。It doesn't share the user database with Ambari. HDInsight 会对搜索筛选器进行配置,以便对管理员用户、监视用户以及在创建群集期间指定的组的成员进行同步。HDInsight configures the search filter to sync the admin user, the watchdog user, and the members of the group specified during the cluster creation. 组成员将以传递方式进行同步:The group members will be synced transitively:

  1. 禁用增量同步。Disable incremental sync.
  2. 启用用户组同步映射。Enable the User group sync map.
  3. 将搜索筛选器指定为包括可传递的组成员。Specify the search filter to include the transitive group members.
  4. 同步用户的 sAMAccountName 特性和组的 name 特性。Sync the sAMAccountName attribute for users and the name attribute for groups.

组同步或增量同步Group or incremental sync

Ranger 支持组同步选项,但它是作为与用户筛选器的交集(而不是组成员身份与用户筛选器的并集)使用的。Ranger supports a group sync option, but it works as an intersection with user filter, not as a union between group memberships and user filter. Ranger 中的组同步筛选器的典型用例如下:组筛选器:(dn=clusteradmingroup),用户筛选器:(city=seattle)。A typical use case for group sync filter in Ranger is - group filter: (dn=clusteradmingroup), user filter: (city=seattle).

增量同步仅适用于已首次同步的用户。Incremental sync works only for the users who are already synced (the first time). 增量同步不会同步在初始同步后添加到组的任何新用户。Incremental won't sync any new users added to the groups after the initial sync.

更新 Ranger 同步筛选器Update Ranger sync filter

可以在 Ambari UI 中的“Ranger 用户-同步配置”部分找到 LDAP 筛选器。The LDAP filter can be found in the Ambari UI, under the Ranger user-sync configuration section. 现有筛选器的格式将为 (|(userPrincipalName=bob@contoso.com)(userPrincipalName=hdiwatchdog-core01@CONTOSO.ONMICROSOFT.COM)(memberOf:1.2.840.113556.1.4.1941:=CN=hadoopgroup,OU=AADDC Users,DC=contoso,DC=onmicrosoft,DC=com))The existing filter will be in the form (|(userPrincipalName=bob@contoso.com)(userPrincipalName=hdiwatchdog-core01@CONTOSO.ONMICROSOFT.COM)(memberOf:1.2.840.113556.1.4.1941:=CN=hadoopgroup,OU=AADDC Users,DC=contoso,DC=onmicrosoft,DC=com)). 请确保在末尾添加谓词,并使用 net ads 搜索命令、ldp.exe 或类似的工具来测试筛选器。Ensure that you add predicate at the end and test the filter by using net ads search command or ldp.exe or something similar.

Ranger 用户同步日志Ranger user sync logs

Ranger 用户同步可以在任一头节点中进行。Ranger user sync can happen out of either of the headnodes. 日志位于 /var/log/ranger/usersync/usersync.log 中。The logs are in /var/log/ranger/usersync/usersync.log. 若要增大日志的详细级别,请执行以下步骤:To increase the verbosity of the logs, do the following steps:

  1. 登录到 Ambari。Log in to Ambari.
  2. 转到 Ranger 配置部分。Go to the Ranger configuration section.
  3. 转到“高级 usersync-log4j”部分。Go to the Advanced usersync-log4j section.
  4. log4j.rootLogger 更改为 DEBUG 级别。Change the log4j.rootLogger to DEBUG level. (更改后,它应类似于 log4j.rootLogger = DEBUG,logFile,FilterLog)。(After changing it, it should look like log4j.rootLogger = DEBUG,logFile,FilterLog).
  5. 保存配置并重启 Ranger。Save the configuration and restart Ranger.

Ranger 用户同步的已知问题Known issues with Ranger user sync

  • 如果组名称包含 unicode 字符,则 Ranger 同步将无法同步该对象。If the group name has unicode characters, Ranger sync fails to sync that object. 如果用户属于具有国际字符的组,则 Ranger 会同步部分组成员身份If a user belongs to a group that has international characters, Ranger syncs partial group membership
  • 用户名 (sAMAccountName) 和组名 (name) 的长度必须小于或等于 20 个字符。User name (sAMAccountName) and group name (name) have to be 20 characters long or less. 如果组名过长,则在计算权限时,用户将被视为不属于该组。If the group name is longer, then the user will be treated as if they do not belong to the group, when calculating the permissions.

后续步骤Next steps