将本地 Apache Hadoop 群集迁移到 Azure HDInsight - 安全性和 DevOps 最佳做法Migrate on-premises Apache Hadoop clusters to Azure HDInsight - security and DevOps best practices

本文提供有关 Azure HDInsight 系统安全性和 DevOps 的建议。This article gives recommendations for security and DevOps in Azure HDInsight systems. 本文是帮助用户将本地 Apache Hadoop 系统迁移到 Azure HDInsight 的最佳做法系列教程中的其中一篇。It's part of a series that provides best practices to assist with migrating on-premises Apache Hadoop systems to Azure HDInsight.

实现端到端企业安全性Implement end to end enterprise security

使用以下控件可以实现端到端的企业安全性:End to end enterprise security can be achieved using the following controls:

  • 专用和受保护的数据管道(外围级别安全性)Private and protected data pipeline (perimeter level security)

    • 可以通过 Azure 虚拟网络、网络安全组和网关服务实现外围级别安全性Perimeter level Security can be achieved through Azure Virtual Networks, Network Security Groups, and Gateway service
  • 数据访问的身份验证和授权Authentication and authorization for data access

    • 使用 Azure Active Directory 域服务创建已加入域的 HDInsight 群集。Create Domain-joined HDInsight cluster using Azure Active Directory Domain Services. (企业安全性套餐)(Enterprise Security Package)
    • 使用 Ambari 为 AD 用户提供对群集资源的基于角色的访问Use Ambari to provide Role-based access to cluster resources for AD users
    • 使用 Apache Ranger 在表/列/行级别为 Hive 设置访问控制策略。Use Apache Ranger to set access control policies for Hive at the table / column / row level.
    • 此外,只有管理员能够通过 SSH 访问群集。SSH access to the cluster can be restricted only to the administrator.
  • 审核Auditing

    • 查看和报告对 HDInsight 群集资源与数据的所有访问。View and report all access to the HDInsight cluster resources and data.
    • 查看并报告对访问控制策略的所有更改View and report all changes to the access control policies
  • 加密Encryption

    • 使用 Microsoft 托管的密钥或客户管理的密钥进行透明的服务器端加密。Transparent Server-Side encryption using Microsoft-managed keys or customer-managed keys.
    • 使用客户端加密的传输加密、https 和 TLSIn Transit encryption using Client-Side encryption, https and TLS

有关详细信息,请参阅下列文章:For more information, see the following articles:

使用监视和警报Use monitoring & alerting

有关详细信息,请参阅文章:For more information, see the article:

Azure Monitor 概述Azure Monitor Overview

升级群集Upgrade clusters

定期升级到最新的 HDInsight 版本,以利用最新功能。Regularly upgrade to the latest HDInsight version to take advantage of the latest features. 可以使用以下步骤将群集升级到最新版本:The following steps can be used to upgrade the cluster to the latest version:

  1. 使用最新的 HDInsight 版本创建新的 TEST HDInsight 群集。Create a new TEST HDInsight cluster using the latest available HDInsight version.
  2. 测试新群集以确保作业和工作负载按预期工作。Test on the new cluster to make sure that the jobs and workloads work as expected.
  3. 根据需要修改作业、应用程序或工作负载。Modify jobs or applications or workloads as required.
  4. 备份所有存储在本地群集节点上的暂时性数据。Back up any transient data stored locally on the cluster nodes.
  5. 删除现有的群集。Delete the existing cluster.
  6. 使用与前一个群集相同的默认数据和元存储,在同一虚拟网络子网中创建最新 HDInsight 版本的群集。Create a cluster of the latest HDInsight version in the same virtual network subnet, using the same default data and meta store as the previous cluster.
  7. 导入任何已备份的临时数据。Import any transient data that was backed up.
  8. 使用新群集启动作业/继续处理。Start jobs/continue processing using the new cluster.

有关详细信息,请参阅文章:将 HDInsight 群集升级到新版本For more information, see the article: Upgrade HDInsight cluster to a new version

修补群集操作系统Patch cluster operating systems

有关详细信息,请参阅文章:针对 HDInsight 的 OS 修补For more information, see the article: OS patching for HDInsight

迁移后Post-Migration

  1. 修复应用程序 - 迭代地对作业、进程和脚本进行必要的更改。Remediate applications - Iteratively make the necessary changes to the jobs, processes, and scripts.
  2. 执行测试 - 迭代地运行功能和性能测试。Perform Tests - Iteratively run functional and performance tests.
  3. 优化 - 根据上述测试结果解决任何性能问题,然后重新测试以确认性能改进。Optimize - Address any performance issues based on the above test results and then retest to confirm the performance improvements.

后续步骤Next steps