Azure HDInsight 支持的高可用性服务High availability services supported by Azure HDInsight

为了给分析组件提供最佳的可用性级别,我们使用独特的体系结构开发了 HDInsight,以确保关键服务的高可用性 (HA)。In order to provide you with optimal levels of availability for your analytics components, HDInsight was developed with a unique architecture for ensuring high availability (HA) of critical services. 此体系结构的某些组件由 Microsoft 开发,旨在提供自动故障转移。Some components of this architecture were developed by Microsoft to provide automatic failover. 其他组件是为了支持特定的服务而部署的标准 Apache 组件。Other components are standard Apache components that are deployed to support specific services. 本文介绍 HDInsight 中 HA 服务模型的体系结构,HDInsight 如何支持 HA 服务的故障转移,以及在其他服务发生中断后如何进行恢复。This article explains the architecture of the HA service model in HDInsight, how HDInsight supports failover for HA services, and best practices to recover from other service interruptions.


无偏差通信Bias-free communication

Microsoft 支持多样化的包容性环境。Microsoft supports a diverse and inclusionary environment. 本文包含对单词 slave 的引用。This article contains references to the word slave. Microsoft 的无偏差通信风格指南将其视为排他性单词。The Microsoft style guide for bias-free communication recognizes this as an exclusionary word. 本文使用该单词旨在保持一致性,因为目前软件中使用的是该单词。The word is used in this article for consistency because it's currently the word that appears in the software. 如果软件更新后删除了该单词,则本文也将更新以保持一致。When the software is updated to remove the word, this article will be updated to be in alignment.

高可用性基础结构High availability infrastructure

HDInsight 提供自定义的基础结构,以确保四个主要服务具有高可用性和自动故障转移功能:HDInsight provides customized infrastructure to ensure that four primary services are high availability with automatic failover capabilities:

  • Apache Ambari 服务器Apache Ambari server
  • 适用于 Apache YARN 的应用程序时间线服务器Application Timeline Server for Apache YARN
  • 适用于 Hadoop MapReduce 的作业历史记录服务器Job History Server for Hadoop MapReduce
  • Apache LivyApache Livy

此基础结构由许多服务和软件组件构成,其中的一些组件由 Microsoft 设计。This infrastructure consists of a number of services and software components, some of which are designed by Microsoft. 下面是 HDInsight 平台中特有的组件:The following components are unique to the HDInsight platform:

  • 从属故障转移控制器Slave failover controller
  • 主故障转移控制器Master failover controller
  • 从属高可用性服务Slave high availability service
  • 主高可用性服务Master high availability service


此外,还有开源 Apache 可靠性组件支持的其他高可用性服务。There are also other high availability services, which are supported by open-source Apache reliability components. HDInsight 群集中还包含以下组件:These components are also present on HDInsight clusters:

  • Hadoop 文件系统 (HDFS) NameNodeHadoop File System (HDFS) NameNode
  • YARN ResourceManagerYARN ResourceManager
  • HBase MasterHBase Master

以下部分将详细介绍这些服务如何协同工作。The following sections will provide more detail about how these services work together.

HDInsight 高可用性服务HDInsight high availability services

Microsoft 为下表中所述的 HDInsight 群集中的四个 Apache 服务提供支持。Microsoft provides support for the four Apache services in the following table in HDInsight clusters. 为了将这些服务与 Apache 组件支持的高可用性服务区分开来,下表中将它们称作“HDInsight HA 服务”。To distinguish them from high availability services supported by components from Apache, they are called HDInsight HA services.

服务Service 群集节点Cluster nodes 群集类型Cluster types 目的Purpose
Apache Ambari 服务器Apache Ambari server 活动头节点Active headnode 全部All 监视和管理群集。Monitors and manages the cluster.
适用于 Apache YARN 的应用程序时间线服务器Application Timeline Server for Apache YARN 活动头节点Active headnode 除 Kafka 以外的所有服务All except Kafka 维护有关群集上运行的 YARN 作业的调试信息。Maintains debugging info about YARN jobs running on the cluster.
适用于 Hadoop MapReduce 的作业历史记录服务器Job History Server for Hadoop MapReduce 活动头节点Active headnode 除 Kafka 以外的所有服务All except Kafka 维护 MapReduce 作业的调试数据。Maintains debugging data for MapReduce jobs.
Apache LivyApache Livy 活动头节点Active headnode SparkSpark 用于通过 REST 接口轻松与 Spark 群集交互Enables easy interaction with a Spark cluster over a REST interface


HDInsight 企业安全性套餐 (ESP) 群集目前仅提供 Ambari 服务器高可用性。HDInsight Enterprise Security Package (ESP) clusters currently only provide the Ambari server high availability. 应用程序时间线服务器、作业历史记录服务器和 Livy 都仅在 headnode0 上运行,当 Ambari 故障转移时不会故障转移到 headnode1。Application Timeline Server, Job History Server and Livy are all running only on headnode0 and they don't failover to headnode1 when Ambari failsover. 应用程序时间线数据库也位于 headnode0 上,不位于 Ambari SQL 服务器上。The application timeline database is also on headnode0 and not on Ambari SQL server.


每个 HDInsight 群集有两个头节点,这些节点分别处于活动状态和待机模式。Each HDInsight cluster has two headnodes in active and standby modes, respectively. HDInsight HA 服务仅在头节点上运行。The HDInsight HA services run on headnodes only. 这些服务应始终在活动头节点上运行,在待机头节点上应将其停止并置于维护模式。These services should always be running on the active headnode, and stopped and put in maintenance mode on the standby headnode.

为了保持 HA 服务的正常状态并提供快速故障转移,HDInsight 利用 Apache ZooKeeper(分布式应用程序的协调服务)来执行活动头节点的选举。To maintain the correct states of HA services and provide a fast failover, HDInsight utilizes Apache ZooKeeper, which is a coordination service for distributed applications, to conduct active headnode election. HDInsight 还会预配几个后台 Java 进程,用于协调 HDInsight HA 服务的故障转移过程。HDInsight also provisions a few background Java processes, which coordinate the failover procedure for HDInsight HA services. 这些服务包括:主故障转移控制器、从属故障转移控制器、 master-ha-serviceslave-ha-serviceThese services are the following: the master failover controller, the slave failover controller, the master-ha-service , and the slave-ha-service .

Apache ZooKeeperApache ZooKeeper

Apache ZooKeeper 是分布式应用程序的高性能协调服务。Apache ZooKeeper is a high-performance coordination service for distributed applications. 在生产环境中,ZooKeeper 通常在复制模式下运行,其中的 ZooKeeper 服务器复制组构成了仲裁。In production, ZooKeeper usually runs in replicated mode where a replicated group of ZooKeeper servers form a quorum. 每个 HDInsight 群集包含三个 ZooKeeper 节点,这些节点允许三个 ZooKeeper 服务器构成仲裁。Each HDInsight cluster has three ZooKeeper nodes that allow three ZooKeeper servers to form a quorum. HDInsight 的两个 ZooKeeper 仲裁以相互并行的方式运行。HDInsight has two ZooKeeper quorums running in parallel with each other. 其中一个仲裁确定了群集中应运行 HDInsight HA 服务的活动头节点。One quorum decides the active headnode in a cluster on which HDInsight HA services should run. 另一个仲裁用于协调 Apache 提供的 HA 服务,后续部分将会详述。Another quorum is used to coordinate HA services provided by Apache, as detailed in later sections.

从属故障转移控制器Slave failover controller

从属故障转移控制器在 HDInsight 群集中的每个节点上运行。The slave failover controller runs on every node in an HDInsight cluster. 此控制器负责在每个节点上启动 Ambari 代理和 slave-ha-serviceThis controller is responsible for starting the Ambari agent and slave-ha-service on each node. 它定期在第一个 ZooKeeper 仲裁中查询有关活动头节点的信息。It periodically queries the first ZooKeeper quorum about the active headnode. 当活动和待机头节点发生变化时,从属故障转移控制器将执行以下操作:When the active and standby headnodes change, the slave failover controller performs the following:

  1. 更新主机配置文件。Updates the host configuration file.
  2. 重启 Ambari 代理。Restarts Ambari agent.

slave-ha-service 负责停止待机头节点上的 HDInsight HA 服务(Ambari 服务器除外)。The slave-ha-service is responsible for stopping the HDInsight HA services (except Ambari server) on the standby headnode.

主故障转移控制器Master failover controller

主故障转移控制器在两个头节点上运行。A master failover controller runs on both headnodes. 这两个主故障转移控制器与第一个 ZooKeeper 仲裁通信,以将它们运行所在的头节点指定为活动头节点。Both master failover controllers communicate with the first ZooKeeper quorum to nominate the headnode that they're running on as the active headnode.

例如,如果头节点 0 上的主故障转移控制器赢得选举,则会发生以下更改:For example, if the master failover controller on headnode 0 wins the election, the following changes take place:

  1. 头节点 0 变为活动头节点。Headnode 0 becomes active.
  2. 主故障转移控制器在头节点 0 上启动 Ambari 服务器和 master-ha-serviceThe master failover controller starts Ambari server and the master-ha-service on headnode 0.
  3. 另一个主故障转移控制器在头节点 1 上停止 Ambari 服务器和 master-ha-serviceThe other master failover controller stops Ambari server and the master-ha-service on headnode 1.

master-ha-service 仅在活动头节点上运行,它会停止待机头节点上的 HDInsight HA 服务(Ambari 服务器除外),并在活动头节点上启动这些服务。The master-ha-service only runs on the active headnode, it stops the HDInsight HA services (except Ambari server) on standby headnode and starts them on active headnode.

故障转移过程The failover process


运行状况监视器在每个头节点上连同主故障转移控制器一起运行,将检测信号通知发送到 Zookeeper 仲裁。A health monitor runs on each headnode along with the master failover controller to send heartbeat notifications to the Zookeeper quorum. 在此方案中,头节点被视为 HA 服务。The headnode is regarded as an HA service in this scenario. 运行状况监视器检查每个高可用性服务是否正常,以及该服务是否已准备好参与领导选举。The health monitor checks to see if each high availability service is healthy and if it's ready to join in the leadership election. 如果是,则此头节点将参与竞选。If yes, this headnode will compete in the election. 否则,它将退出选举,直到再次准备就绪。If not, it will quit the election until it becomes ready again.

如果待机头节点赢得领导选举并变为活动头节点(例如,在前一个活动节点发生故障时),则其主故障转移控制器将启动其上的所有 HDInsight HA 服务。If the standby headnode ever achieves leadership and becomes active (such as in the case of a failure with the previous active node), its master failover controller will start all HDInsight HA services on it. 主故障转移控制器还会停止另一头节点上的这些服务。The master failover controller will also stop these services on the other headnode.

HDInsight HA 服务发生故障(例如,服务关闭或不正常)时,主故障转移控制器应根据头节点的状态自动重启或停止服务。For HDInsight HA service failures, such as a service being down or unhealthy, the master failover controller should automatically restart or stop the services according to the headnode status. 用户不应在这两个头节点上手动启动 HDInsight HA 服务,Users shouldn't manually start HDInsight HA services on both head nodes. 而应通过自动或手动故障转移来帮助恢复服务。Instead, allow automatic or manual failover to help the service recover.

意外的手动干预Inadvertent manual intervention

HDInsight HA 服务只应在活动头节点上运行,并在必要时自动重启。HDInsight HA services should only run on the active headnode, and will be automatically restarted when necessary. 由于单个 HA 服务没有自身的运行状况监视器,因此无法在单个服务的级别触发故障转移。Since individual HA services don't have their own health monitor, failover can't be triggered at the level of the individual service. 可以确保故障转移在节点级别发生,但不能确保在服务级别发生。Failover is ensured at the node level and not at the service level.

某些已知问题Some known issues

  • 在待机节点上手动启动某个 HA 服务时,在发生下一次故障转移之前该服务不会停止。When manually starting an HA service on the standby headnode, it won't stop until next failover happens. 当 HA 服务同时在两个头节点上运行时,可能会出现的一些问题包括:Ambari UI 不可访问、Ambari 引发错误,YARN、Spark、Oozie 作业可能会停滞。When HA services are running on both headnodes, some potential problems include: Ambari UI is inaccessible, Ambari throws errors, YARN, Spark, and Oozie jobs may get stuck.

  • 当活动头节点上的某个 HA 服务停止时,在发生下一次故障转移或者主故障转移控制器/master-ha-service 重启之前,该服务不会重启。When an HA service on the active headnode stops, it won't restart until next failover happens or the master failover controller/master-ha-service restarts. 当活动头节点上的一个或多个 HA 服务停止时(尤其是当 Ambari 服务器停止时),Ambari UI 将不可访问,其他潜在问题包括 YARN、Spark 和 Oozie 作业失败。When one or more HA services stop on the active headnode, especially when Ambari server stops, Ambari UI is inaccessible, other potential problems include YARN, Spark, and Oozie jobs failures.

Apache 高可用性服务Apache high availability services

Apache 为 HDFS NameNode、YARN ResourceManager 和 HBase Master(在 HDInsight 群集中也可用)提供高可用性。Apache provides high availability for HDFS NameNode, YARN ResourceManager, and HBase Master, which are also available in HDInsight clusters. 与 HDInsight HA 服务不同,这些组件在 ESP 群集中受支持。Unlike HDInsight HA services, they are supported in ESP clusters. Apache HA 服务与第二个 ZooKeeper 仲裁通信(如上一部分所述),以选择活动/待机状态并执行自动故障转移。Apache HA services communicate with the second ZooKeeper quorum (described in the above section) to elect active/standby states and conduct automatic failover. 以下部分将详细说明这些服务的工作原理。The following sections detail how these services work.

Hadoop 分布式文件系统 (HDFS) NameNodeHadoop Distributed File System (HDFS) NameNode

基于 Apache Hadoop 2.0 或更高版本的 HDInsight 群集提供 NameNode 高可用性。HDInsight clusters based on Apache Hadoop 2.0 or higher provide NameNode high availability. 有两个为自动故障转移配置的 NameNode 在头节点上运行。There are two NameNodes running on the headnodes, which are configured for automatic failover. 这些 NameNode 使用 ZKFailoverController 来与 Zookeeper 通信,以选择活动/待机状态。The NameNodes use the ZKFailoverController to communicate with Zookeeper to elect for active/standby status. ZKFailoverController 在两个头节点上运行,其工作方式与上述主故障转移控制器相同。The ZKFailoverController runs on both headnodes, and works in the same way as the master failover controller above.

第二个 Zookeeper 仲裁独立于第一个仲裁,因此活动 NameNode 不能在活动头节点上运行。The second Zookeeper quorum is independent of the first quorum, so the active NameNode may not run on the active headnode. 当活动 NameNode 处于死机或不正常状态时,待机 NameNode 将赢得选举并变为活动头节点。When the active NameNode is dead or unhealthy, the standby NameNode wins the election and becomes active.

YARN ResourceManagerYARN ResourceManager

基于 Apache Hadoop 2.4 或更高版本的 HDInsight 群集支持 YARN ResourceManager 高可用性。HDInsight clusters based on Apache Hadoop 2.4 or higher, support YARN ResourceManager high availability. 有两个 ResourceManager(rm1 和 rm2)分别在头节点 0 和头节点 1 上运行。There are two ResourceManagers, rm1 and rm2, running on headnode 0 and headnode 1, respectively. 与 NameNode 一样,YARN ResourceManager 也是为自动故障转移配置的。Like NameNode, YARN ResourceManager is also configured for automatic failover. 如果当前活动 ResourceManager 关闭或无响应,将自动选择另一个 ResourceManager 作为活动头节点。Another ResourceManager is automatically elected to be active when the current active ResourceManager goes down or unresponsive.

YARN ResourceManager 使用其嵌入式 ActiveStandbyElector 作为故障检测器和领导选举器。YARN ResourceManager uses its embedded ActiveStandbyElector as a failure detector and leader elector. 与 HDFS NameNode 不同,YARN ResourceManager 不需要独立的 ZKFC 守护程序。Unlike HDFS NameNode, YARN ResourceManager doesn't need a separate ZKFC daemon. 活动的 ResourceManager 将其状态写入 Apache Zookeeper。The active ResourceManager writes its states into Apache Zookeeper.

YARN ResourceManager 的高可用性独立于 NameNode 和其他 HDInsight HA 服务。The high availability of the YARN ResourceManager is independent from NameNode and other HDInsight HA services. 活动的 ResourceManager 不可以在头节点上运行,也不可以在正在运行活动 NameNode 的头节点上运行。The active ResourceManager may not run on the active headnode or the headnode where the active NameNode is running. 有关 YARN ResourceManager 高可用性的详细信息,请参阅 ResourceManager 高可用性For more information about YARN ResourceManager high availability, see ResourceManager High Availability.

HBase MasterHBase Master

HDInsight HBase 群集支持 HBase Master 高可用性。HDInsight HBase clusters support HBase Master high availability. 与头节点上运行的其他 HA 服务不同,HBase 主机在三个 Zookeeper 节点上运行,其中一个节点是活动主节点,另外两个节点是待机节点。Unlike other HA services, which run on headnodes, HBase Masters run on the three Zookeeper nodes, where one of them is the active master and the other two are standby. 与 NameNode 一样,HBase Master 将与 Apache Zookeeper 协调以进行领导选举,并在当前活动主节点出现问题时执行自动故障转移。Like NameNode, HBase Master coordinates with Apache Zookeeper for leader election and does automatic failover when the current active master has problems. 无论何时,都只有一个活动的 HBase Master。There is only one active HBase Master at any time.

后续步骤Next steps