将本地 Apache Hadoop 群集迁移到 Azure HDInsight - 安全性和 DevOps 最佳做法Migrate on-premises Apache Hadoop clusters to Azure HDInsight - security and DevOps best practices

本文提供有关 Azure HDInsight 系统安全性和 DevOps 的建议。This article gives recommendations for security and DevOps in Azure HDInsight systems. 本文是帮助用户将本地 Apache Hadoop 系统迁移到 Azure HDInsight 的最佳做法系列教程中的其中一篇。It's part of a series that provides best practices to assist with migrating on-premises Apache Hadoop systems to Azure HDInsight.

使用企业安全性套餐来保护和管理群集Secure and govern cluster with Enterprise Security Package

企业安全性套餐 (ESP) 支持基于 Active Directory 的身份验证、多用户支持和基于角色的访问控制。The Enterprise Security Package (ESP) supports Active Directory-based authentication, multiuser support, and role-based access control. 选择 ESP 选项后,HDInsight 群集将加入 Active Directory 域,企业管理员可以使用 Apache Ranger 为 Apache Hive 安全性配置基于角色的访问控制 (RBAC)。With the ESP option chosen, HDInsight cluster is joined to the Active Directory domain and the enterprise admin can configure role-based access control (RBAC) for Apache Hive security by using Apache Ranger. 管理员可以审核员工的数据访问权限和对访问控制策略所做的任何更改。The admin can also audit the data access by employees and any changes done to access control policies.

ESP 适用于以下群集类型:Apache Hadoop、Apache Spark、Apache HBase、Apache Kafka 和 Interactive Query (Hive LLAP)。ESP is available on the following cluster types: Apache Hadoop, Apache Spark, Apache HBase, Apache Kafka, and Interactive Query (Hive LLAP).

使用以下步骤部署已加入域的 HDInsight 群集:Use the following steps to deploy the Domain-joined HDInsight cluster:

  • 通过传递域名来部署 Azure Active Directory (AAD)。Deploy Azure Active Directory (AAD) by passing the Domain name.

  • 部署 Azure Active Directory 域服务 (AAD DS)。Deploy Azure Active Directory Domain Services (AAD DS).

  • 创建所需的虚拟网络和子网。Create the required Virtual Network and subnet.

  • 在虚拟网络中部署 VM 以管理 AAD DS。Deploy a VM in the Virtual Network to manage AAD DS.

  • 将 VM 加入域。Join the VM to the domain.

  • 安装 AD 和 DNS 工具。Install AD and DNS tools.

  • 让 AAD DS 管理员创建组织单位 (OU)。Have the AAD DS Administrator create an Organizational Unit (OU).

  • 为 AAD DS 启用 LDAPS。Enable LDAPS for AAD DS.

  • 在 Azure Active Directory 中创建一个服务帐户,并为 OU 授予读写管理权限,以便它可进行这些操作。Create a service account in Azure Active Directory with delegated read & write admin permission to the OU, so that it can. 然后,此服务帐户可将计算机加入域,并将计算机主体置于 OU 中。This service account can then join machines to the domain and place machine principals within the OU. 还可在群集创建期间指定的 OU 内创建服务主体。It can also create service principals within the OU that you specify during cluster creation.

    备注

    服务帐户不需要是 AD 域管理员帐户。The service account does�not�need to be�AD domain admin account.

  • 通过设置以下参数来部署 HDInsight ESP 群集:Deploy HDInsight ESP cluster by setting the following parameters:

    参数Parameter 描述Description
    域名Domain name 与 Azure AD DS 关联的域名。The domain name that's associated with Azure AD DS.
    域用户名Domain user name 在前面的部分中创建的 Azure AD DS DC 托管域中的服务帐户,例如:hdiadmin@contoso.onmicrosoft.comThe service account in the Azure AD DS DC-managed domain that you created in the previous section, for example: hdiadmin@contoso.onmicrosoft.com. 此域用户将成为此 HDInsight 群集的管理员。This domain user will be the administrator of this HDInsight cluster.
    域密码Domain password 服务帐户的密码。The password of the service account.
    组织单位Organizational unit 要与 HDInsight 群集配合使用的 OU 的可分辨名称,例如:OU=HDInsightOU,DC=contoso,DC=onmicrosoft,DC=comThe distinguished name of the OU that you want to use with the HDInsight cluster, for example: OU=HDInsightOU,DC=contoso,DC=onmicrosoft,DC=com. 如果此 OU 不存在,则 HDInsight 群集将尝试使用服务帐户的权限创建 OU。If this OU doesn't exist, the HDInsight cluster tries to create the OU using the privileges of the service account.
    LDAPS URLLDAPS URL 例如,ldaps://contoso.onmicrosoft.com:636for example, ldaps://contoso.onmicrosoft.com:636.
    访问用户组Access user group 其用户要同步到群集的安全组,例如:HiveUsersThe security groups whose users you want to sync to the cluster, for example: HiveUsers. 如果想要指定多个用户组,请使用分号“;”分隔。If you want to specify multiple user groups, separate them by semicolon ';'. 创建 ESP 群集之前,组必须存在于目录中。The group(s) must exist in the directory before creating the ESP cluster.

有关详细信息,请参阅下列文章:For more information, see the following articles:

实现端到端企业安全性Implement end to end enterprise security

使用以下控件可以实现端到端的企业安全性:End to end enterprise security can be achieved using the following controls:

  • 专用和受保护的数据管道(外围级别安全性)Private and protected data pipeline (perimeter level security)

    • 可以通过 Azure 虚拟网络、网络安全组和网关服务实现外围级别安全性Perimeter level Security can be achieved through Azure Virtual Networks, Network Security Groups, and Gateway service
  • 数据访问的身份验证和授权Authentication and authorization for data access

    • 使用 Azure Active Directory 域服务创建已加入域的 HDInsight 群集。Create Domain-joined HDInsight cluster using Azure Active Directory Domain Services. (企业安全性套餐)(Enterprise Security Package)
    • 使用 Ambari 为 AD 用户提供对群集资源的基于角色的访问Use Ambari to provide Role-based access to cluster resources for AD users
    • 使用 Apache Ranger 在表/列/行级别为 Hive 设置访问控制策略。Use Apache Ranger to set access control policies for Hive at the table / column / row level.
    • 此外,只有管理员能够通过 SSH 访问群集。SSH access to the cluster can be restricted only to the administrator.
  • 审核Auditing

    • 查看和报告对 HDInsight 群集资源与数据的所有访问。View and report all access to the HDInsight cluster resources and data.
    • 查看并报告对访问控制策略的所有更改View and report all changes to the access control policies
  • 加密Encryption

    • 使用 Microsoft 托管的密钥或客户管理的密钥进行透明的服务器端加密。Transparent Server-Side encryption using Microsoft-managed keys or customer-managed keys.
    • 使用客户端加密的传输加密、https 和 TLSIn Transit encryption using Client-Side encryption, https and TLS

有关详细信息,请参阅下列文章:For more information, see the following articles:

使用监视和警报Use monitoring & alerting

有关详细信息,请参阅文章:For more information, see the article:

Azure Monitor 概述Azure Monitor Overview

升级群集Upgrade clusters

定期升级到最新的 HDInsight 版本,以利用最新功能。Regularly upgrade to the latest HDInsight version to take advantage of the latest features. 可以使用以下步骤将群集升级到最新版本:The following steps can be used to upgrade the cluster to the latest version:

  1. 使用最新的 HDInsight 版本创建新的 TEST HDInsight 群集。Create a new TEST HDInsight cluster using the latest available HDInsight version.
  2. 测试新群集以确保作业和工作负载按预期工作。Test on the new cluster to make sure that the jobs and workloads work as expected.
  3. 根据需要修改作业、应用程序或工作负载。Modify jobs or applications or workloads as required.
  4. 备份所有存储在本地群集节点上的暂时性数据。Back up any transient data stored locally on the cluster nodes.
  5. 删除现有的群集。Delete the existing cluster.
  6. 使用与前一个群集相同的默认数据和元存储,在同一虚拟网络子网中创建最新 HDInsight 版本的群集。Create a cluster of the latest HDInsight version in the same virtual network subnet, using the same default data and meta store as the previous cluster.
  7. 导入任何已备份的临时数据。Import any transient data that was backed up.
  8. 使用新群集启动作业/继续处理。Start jobs/continue processing using the new cluster.

有关详细信息,请参阅文章:将 HDInsight 群集升级到新版本For more information, see the article: Upgrade HDInsight cluster to a new version

修补群集操作系统Patch cluster operating systems

有关详细信息,请参阅文章:针对 HDInsight 的 OS 修补For more information, see the article: OS patching for HDInsight

迁移后Post-Migration

  1. 修复应用程序 - 迭代地对作业、进程和脚本进行必要的更改。Remediate applications - Iteratively make the necessary changes to the jobs, processes, and scripts.
  2. 执行测试 - 迭代地运行功能和性能测试。Perform Tests - Iteratively run functional and performance tests.
  3. 优化 - 根据上述测试结果解决任何性能问题,然后重新测试以确认性能改进。Optimize - Address any performance issues based on the above test results and then retest to confirm the performance improvements.

后续步骤Next steps