HDInsight 中的 Apache Hadoop 群集的可用性和可靠性Availability and reliability of Apache Hadoop clusters in HDInsight

HDInsight 群集提供了两个头节点,以提升 Apache Hadoop 服务和作业运行的可用性与可靠性。HDInsight clusters provide two head nodes to increase the availability and reliability of Apache Hadoop services and jobs running.

Hadoop 通过将服务和数据复制到群集中的多个节点来实现高可用性和可靠性。Hadoop achieves high availability and reliability by replicating services and data across multiple nodes in a cluster. 不过,Hadoop 的标准分发功能通常只能有一个头节点。However standard distributions of Hadoop typically have only a single head node. 单个头节点发生任何中断都可能导致群集停止工作。Any outage of the single head node can cause the cluster to stop working. HDInsight 提供了两个头节点来提高 Hadoop 的可用性和可靠性。HDInsight provides two headnodes to improve Hadoop's availability and reliability.

Important

Linux 是 HDInsight 3.4 或更高版本上使用的唯一操作系统。Linux is the only operating system used on HDInsight version 3.4 or greater. 有关详细信息,请参阅 HDInsight 在 Windows 上停用For more information, see HDInsight retirement on Windows.

节点的可用性和可靠性Availability and reliability of nodes

HDInsight 群集中的节点是使用 Azure 虚拟机实现的。Nodes in an HDInsight cluster are implemented using Azure Virtual Machines. 以下部分介绍可配合 HDInsight 使用的节点类型。The following sections discuss the individual node types used with HDInsight.

Note

并非所有节点类型都可用于某个群集类型。Not all node types are used for a cluster type. 例如,Hadoop 群集类型就没有任何 Nimbus 节点。For example, a Hadoop cluster type does not have any Nimbus nodes. 有关 HDInsight 群集类型使用的节点详细信息,请参阅在 HDInsight 中创建基于 Linux 的 Hadoop 群集文档的“群集类型”部分。For more information on nodes used by HDInsight cluster types, see the Cluster types section of the Create Linux-based Hadoop clusters in HDInsight document.

头节点Head nodes

为确保 Hadoop 服务的高可用性,HDInsight 提供了两个头节点。To ensure high availability of Hadoop services, HDInsight provides two head nodes. 这两个头节点同时处于活动状态并在 HDInsight 群集中运行。Both head nodes are active and running within the HDInsight cluster simultaneously. 某些服务,例如 Apache HDFS 或 Apache Hadoop YARN,在任何给定的时间仅能在其中一个头节点上处于“活动”状态。Some services, such as Apache HDFS or Apache Hadoop YARN, are only 'active' on one head node at any given time. HiveServer2 或 Hive MetaStore 等其他服务同时在这两个头节点上处于活动状态。Other services such as HiveServer2 or Hive MetaStore are active on both head nodes at the same time.

头节点(以及 HDInsight 中的其他节点)的主机名中包含一个数字值。Head nodes (and other nodes in HDInsight) have a numeric value as part of the hostname of the node. 例如 hn0-CLUSTERNAMEhn4-CLUSTERNAMEFor example, hn0-CLUSTERNAME or hn4-CLUSTERNAME.

Important

请勿将数字值与某个节点是主节点还是辅助节点相关联。Do not associate the numeric value with whether a node is primary or secondary. 使用数字值是为了为每个节点提供唯一名称。The numeric value is only present to provide a unique name for each node.

Nimbus 节点Nimbus Nodes

Apache Storm 群集提供了 Nimbus 节点。Nimbus nodes are available with Apache Storm clusters. Nimbus 节点通过在辅助角色节点之间分发和监视处理资源来提供类似于 Hadoop JobTracker 的功能。The Nimbus nodes provide similar functionality to the Hadoop JobTracker by distributing and monitoring processing across worker nodes. HDInsight 为 Storm 群集提供了两个 Nimbus 节点HDInsight provides two Nimbus nodes for Storm clusters

Apache Zookeeper 节点Apache Zookeeper nodes

ZooKeeper 节点用于对头节点上的主服务进行领导选拨。ZooKeeper nodes are used for leader election of master services on head nodes. 它们还用来确保服务、数据(辅助角色)节点和网关知道主服务在哪个头节点上处于活动状态。They are also used to insure that services, data (worker) nodes, and gateways know which head node a master service is active on. 默认情况下,HDInsight 提供三个 ZooKeeper 节点。By default, HDInsight provides three ZooKeeper nodes.

辅助角色节点Worker nodes

将作业提交到群集时,辅助角色节点执行实际的数据分析。Worker nodes perform the actual data analysis when a job is submitted to the cluster. 如果辅助角色节点发生故障,它执行的任务将提交到另一个辅助角色节点。If a worker node fails, the task that it was performing is submitted to another worker node. 默认情况下,HDInsight 创建四个辅助角色节点。By default, HDInsight creates four worker nodes. 可以在群集创建过程中以及之后根据需要更改此数字。You can change this number to suit your needs both during and after cluster creation.

边缘节点Edge node

边缘节点不主动参与群集内的数据分析。An edge node does not actively participate in data analysis within the cluster. 在使用 Hadoop 时,它由开发人员或数据科学家使用。It is used by developers or data scientists when working with Hadoop. 边缘节点与群集中的其他节点一样驻留在同一个 Azure 虚拟网络中,可直接访问其他所有节点。The edge node lives in the same Azure Virtual Network as the other nodes in the cluster, and can directly access all other nodes. 可以在不将资源带离关键的 Hadoop 服务或分析作业的情况下使用边缘节点。The edge node can be used without taking resources away from critical Hadoop services or analysis jobs.

目前,HDInsight 上的 ML Services 是默认提供边缘节点的唯一群集类型。Currently, ML Services on HDInsight is the only cluster type that provides an edge node by default. 对于 HDInsight 上的 ML Services 而言,边缘节点用于在将 R 代码提交到群集进行分布式处理之前,在本地节点上对这些代码进行测试。For ML Services on HDInsight, the edge node is used test R code locally on the node before submitting it to the cluster for distributed processing.

若要了解如何将边缘节点与其他群集类型配合使用,请参阅在 HDInsight 中使用边缘节点文档。For information on using an edge node with other cluster types, see the Use edge nodes in HDInsight document.

访问节点Accessing the nodes

可以通过公共网关经 Internet 访问群集。Access to the cluster over the internet is provided through a public gateway. 访问仅限连接到头节点和边缘节点(如果存在)。Access is limited to connecting to the head nodes and (if one exists) the edge node. 访问头节点上运行的服务不会受存在多个头节点的影响。Access to services running on the head nodes is not effected by having multiple head nodes. 公共网关将请求路由到托管所请求服务的头节点。The public gateway routes requests to the head node that hosts the requested service. 例如,如果 Apache Ambari 当前托管在辅助头节点上,网关会将 Ambari 收到的请求路由到该节点。For example, if Apache Ambari is currently hosted on the secondary head node, the gateway routes incoming requests for Ambari to that node.

通过公共网关进行访问仅限端口 443 (HTTPS)、22 和 23。Access over the public gateway is limited to port 443 (HTTPS), 22, and 23.

  • 端口 443 用于访问托管在头节点上的 Ambari 和其他 Web UI 或 REST API。Port 443 is used to access Ambari and other web UI or REST APIs hosted on the head nodes.

  • 端口 22 用于通过 SSH 访问主头节点或边缘节点。Port 22 is used to access the primary head node or edge node with SSH.

  • 端口 23 用于通过 SSH 访问辅助头节点。Port 23 is used to access the secondary head node with SSH. 例如,ssh username@mycluster-ssh.azurehdinsight.cn 连接到群集名为 mycluster 的主头节点。For example, ssh username@mycluster-ssh.azurehdinsight.cn connects to the primary head node of the cluster named mycluster.

有关如何使用 SSH 的详细信息,请参阅将 SSH 与 HDInsight 配合使用文档。For more information on using SSH, see the Use SSH with HDInsight document.

内部完全限定的域名 (FQDN)Internal fully qualified domain names (FQDN)

HDInsight 群集中的节点具有内部 IP 地址和 FQDN,这些只能从群集访问。Nodes in an HDInsight cluster have an internal IP address and FQDN that can only be accessed from the cluster. 使用内部 FQDN 或 IP 地址访问群集上的服务时,应该使用 Ambari 来验证访问服务时使用的 IP 或 FQDN。When accessing services on the cluster using the internal FQDN or IP address, you should use Ambari to verify the IP or FQDN to use when accessing the service.

例如,Apache Oozie 服务只能在一个头节点上运行,而且从 SSH 会话使用 oozie 命令需要有该服务的 URL。For example, the Apache Oozie service can only run on one head node, and using the oozie command from an SSH session requires the URL to the service. 可以通过 Ambari 使用以下命令来检索该 URL:This URL can be retrieved from Ambari by using the following command:

curl -u admin:PASSWORD "https://CLUSTERNAME.azurehdinsight.cn/api/v1/clusters/CLUSTERNAME/configurations?type=oozie-site&tag=TOPOLOGY_RESOLVED" | grep oozie.base.url

该命令返回类似以下命令的值,其中包含要在 oozie 命令中使用的内部 URL:This command returns a value similar to the following command, which contains the internal URL to use with the oozie command:

"oozie.base.url": "http://hn0-CLUSTERNAME-randomcharacters.cx.internal.chinacloudapp.cn:11000/oozie"

若要详细了解如何使用 Ambari REST API,请参阅使用 Apache Ambari REST API 监视和管理 HDInsightFor more information on working with the Ambari REST API, see Monitor and Manage HDInsight using the Apache Ambari REST API.

访问其他节点类型Accessing other node types

可以使用以下方法连接到无法直接通过 Internet 访问的节点:You can connect to nodes that are not directly accessible over the internet by using the following methods:

  • SSH:使用 SSH 连接到头节点后,可以从头节点使用 SSH 连接到群集中的其他节点。SSH: Once connected to a head node using SSH, you can then use SSH from the head node to connect to other nodes in the cluster. 有关详细信息,请参阅将 SSH 与 HDInsight 配合使用文档。For more information, see the Use SSH with HDInsight document.

  • SSH 隧道:如果需要访问托管在某个节点上的 Web 服务,并且该服务不在 Internet 上公开,则必须使用 SSH 隧道。SSH Tunnel: If you need to access a web service hosted on one of the nodes that is not exposed to the internet, you must use an SSH tunnel. 有关详细信息,请参阅 将 SSH 隧道与 HDInsight 配合使用文档。For more information, see the Use an SSH tunnel with HDInsight document.

  • Azure 虚拟网络:如果 HDInsight 群集是 Azure 虚拟网络的一部分,则同一虚拟网络中的任何资源都可以直接访问该群集中的所有节点。Azure Virtual Network: If your HDInsight cluster is part of an Azure Virtual Network, any resource on the same Virtual Network can directly access all nodes in the cluster. 有关详细信息,请参阅使用 Azure 虚拟网络扩展 HDInsight 文档。For more information, see the Extend HDInsight using Azure Virtual Network document.

如何检查服务状态How to check on a service status

若要检查头节点中运行的服务的状态,请使用 Ambari Web UI 或 Ambari REST API。To check the status of services that run on the head nodes, use the Ambari Web UI or the Ambari REST API.

Ambari Web UIAmbari Web UI

可在 https://CLUSTERNAME.azurehdinsight.cn 处查看 Ambari Web UI。The Ambari Web UI is viewable at https://CLUSTERNAME.azurehdinsight.cn. CLUSTERNAME 替换为群集名称。Replace CLUSTERNAME with the name of your cluster. 如果出现提示,请输入群集的 HTTP 用户凭据。If prompted, enter the HTTP user credentials for your cluster. 默认 HTTP 用户名为 admin,密码是创建群集时输入的密码。The default HTTP user name is admin and the password is the password you entered when creating the cluster.

出现 Ambari 页面时,该页的左侧列出已安装的服务。When you arrive on the Ambari page, the installed services are listed on the left of the page.

已安装的服务

服务旁边可能会出现一系列表示状态的图标。There are a series of icons that may appear next to a service to indicate status. 可以使用页面顶部的“警报”链接查看与服务相关的任何警报。Any alerts related to a service can be viewed using the Alerts link at the top of the page. Ambari 提供多个预定义的警报。Ambari offers several predefined alerts.

以下警报可以帮助监视群集的可用性:The following alerts help monitor the availability of a cluster:

警报名称Alert Name 说明Description
指标监视器状态Metric Monitor Status 此警报指示监视器状态脚本确定的指标监视器进程状态。This alert indicates the status of the Metrics Monitor process as determined by the monitor status script.
Ambari 代理检测信号Ambari Agent Heartbeat 如果服务器与代理之间的通信断开,则会触发此警报。This alert is triggered if the server has lost contact with an agent.
ZooKeeper 服务器进程ZooKeeper Server Process 如果无法确定 ZooKeeper 服务器进程是否正在运行并在网络上侦听,则会触发此主机级别的警报。This host-level alert is triggered if the ZooKeeper server process cannot be determined to be up and listening on the network.
IOCache 元数据服务器状态IOCache Metadata Server Status 如果无法确定 IOCache 元数据服务器是否正在运行并在响应客户端请求,则会触发此主机级别的警报This host-level alert is triggered if the IOCache Metadata Server cannot be determined to be up and responding to client requests
JournalNode Web UIJournalNode Web UI 如果无法访问 JournalNode Web UI,则会触发此主机级别的警报。This host-level alert is triggered if the JournalNode Web UI is unreachable.
Spark2 Thrift 服务器Spark2 Thrift Server 如果无法确定 Spark2 Thrift 服务器是否正在运行,则会触发此主机级别的警报。This host-level alert is triggered if the Spark2 Thrift Server cannot be determined to be up.
历史记录服务器进程History Server Process 如果无法确定历史记录服务器进程是否正在运行并在网络上侦听,则会触发此主机级别的警报。This host-level alert is triggered if the History Server process cannot be established to be up and listening on the network.
历史记录服务器 Web UIHistory Server Web UI 如果无法访问历史记录服务器 Web UI,则会触发此主机级别的警报。This host-level alert is triggered if the History Server Web UI is unreachable.
ResourceManager Web UIResourceManager Web UI 如果无法访问 ResourceManager Web UI,则会触发此主机级别的警报。This host-level alert is triggered if the ResourceManager Web UI is unreachable.
NodeManager 运行状况摘要NodeManager Health Summary 如果存在不正常的 NodeManager,则会触发此服务级别的警报This service-level alert is triggered if there are unhealthy NodeManagers
App Timeline Web UIApp Timeline Web UI 如果无法访问 App Timeline 服务器 Web UI,则会触发此主机级别的警报。This host-level alert is triggered if the App Timeline Server Web UI is unreachable.
DataNode 运行状况摘要DataNode Health Summary 如果存在不正常的 DataNode,则会触发此服务级别的警报This service-level alert is triggered if there are unhealthy DataNodes
NameNode Web UINameNode Web UI 如果无法访问 NameNode Web UI,则会触发此主机级别的警报。This host-level alert is triggered if the NameNode Web UI is unreachable.
ZooKeeper 故障转移控制器进程ZooKeeper Failover Controller Process 如果无法确认 ZooKeeper 故障转移控制器进程是否正在运行并在网络上侦听,则会触发此主机级别的警报。This host-level alert is triggered if the ZooKeeper Failover Controller process cannot be confirmed to be up and listening on the network.
Oozie 服务器 Web UIOozie Server Web UI 如果无法访问 Oozie 服务器 Web UI,则会触发此主机级别的警报。This host-level alert is triggered if the Oozie server Web UI is unreachable.
Oozie 服务器状态Oozie Server Status 如果无法确定 Oozie 服务器是否正在运行并在响应客户端请求,则会触发此主机级别的警报。This host-level alert is triggered if the Oozie server cannot be determined to be up and responding to client requests.
Hive 元存储进程Hive Metastore Process 如果无法确定 Hive 元存储进程是否正在运行并在网络上侦听,则会触发此主机级别的警报。This host-level alert is triggered if the Hive Metastore process cannot be determined to be up and listening on the network.
HiveServer2 进程HiveServer2 Process 如果无法确定 HiveServer 是否正在运行并在响应客户端请求,则会触发此主机级别的警报。This host-level alert is triggered if the HiveServer cannot be determined to be up and responding to client requests.
WebHCat 服务器状态WebHCat Server Status 如果 Templeton 服务器状态不正常,则会触发此主机级别的警报。This host-level alert is triggered if the templeton server status is not healthy.
可用 ZooKeeper 服务器百分比Percent ZooKeeper Servers Available 如果群集中已关闭的 ZooKeeper 服务器数目大于配置的“严重”阈值,则会触发此警报。This alert is triggered if the number of down ZooKeeper servers in the cluster is greater than the configured critical threshold. 此值聚合了 ZooKeeper 进程检查的结果。It aggregates the results of ZooKeeper process checks.
Spark2 Livy 服务器Spark2 Livy Server 如果无法确定 Livy2 服务器是否正在运行,则会触发此主机级别的警报。This host-level alert is triggered if the Livy2 Server cannot be determined to be up.
Spark2 历史记录服务器Spark2 History Server 如果无法确定 Spark2 历史记录服务器是否正在运行,则会触发此主机级别的警报。This host-level alert is triggered if the Spark2 History Server cannot be determined to be up.
指标收集器进程Metrics Collector Process 如果在与阈值相等的秒数内无法确认指标收集器是否正在运行并在配置的端口上侦听,则会触发此警报。This alert is triggered if the Metrics Collector cannot be confirmed to be up and listening on the configured port for number of seconds equal to threshold.
指标收集器 - HBase Master 进程Metrics Collector - HBase Master Process 如果在配置的“严重”阈值(以秒为单位)内无法确认指标收集器的 HBase Master 进程是否正在运行并在网络上侦听,则会触发此警报。This alert is triggered if the Metrics Collector's HBase master processes cannot be confirmed to be up and listening on the network for the configured critical threshold, given in seconds.
可用指标监视器百分比Percent Metrics Monitors Available 如果在配置的“警告”和“严重”阈值内,有特定百分比的指标监视器进程未运行并在网络上侦听,则会触发此警报。This alert is triggered if a percentage of Metrics Monitor processes are not up and listening on the network for the configured warning and critical thresholds.
可用 NodeManager 百分比Percent NodeManagers Available 如果群集中已关闭的 NodeManager 数目大于配置的“严重”阈值,则会触发此警报。This alert is triggered if the number of down NodeManagers in the cluster is greater than the configured critical threshold. 此值聚合了 NodeManager 进程检查的结果。It aggregates the results of NodeManager process checks.
NodeManager 运行状况NodeManager Health 此主机级别的警报检查 NodeManager 组件中提供的节点运行状况属性。This host-level alert checks the node health property available from the NodeManager component.
NodeManager Web UINodeManager Web UI 如果无法访问 NodeManager Web UI,则会触发此主机级别的警报。This host-level alert is triggered if the NodeManager Web UI is unreachable.
NameNode 高可用性运行状况NameNode High Availability Health 如果主动 NameNode 或待机 NameNode 未运行,则会触发此服务级别的警报。This service-level alert is triggered if either the Active NameNode or Standby NameNode are not running.
DataNode 进程DataNode Process 如果无法确定单个 DataNode 进程是否正在运行并在网络上侦听,则会触发此主机级别的警报。This host-level alert is triggered if the individual DataNode processes cannot be established to be up and listening on the network.
DataNode Web UIDataNode Web UI 如果无法访问 DataNode Web UI,则会触发此主机级别的警报。This host-level alert is triggered if the DataNode Web UI is unreachable.
可用 JournalNode 百分比Percent JournalNodes Available 如果群集中已关闭的 JournalNode 数目大于配置的“严重”阈值,则会触发此警报。This alert is triggered if the number of down JournalNodes in the cluster is greater than the configured critical threshold. 此值聚合了 JournalNode 进程检查的结果。It aggregates the results of JournalNode process checks.
可用 DataNode 百分比Percent DataNodes Available 如果群集中已关闭的 DataNode 数目大于配置的“严重”阈值,则会触发此警报。This alert is triggered if the number of down DataNodes in the cluster is greater than the configured critical threshold. 此值聚合了 DataNode 进程检查的结果。It aggregates the results of DataNode process checks.
Zeppelin 服务器状态Zeppelin Server Status 如果无法确定 Zeppelin 服务器是否正在运行并在响应客户端请求,则会触发此主机级别的警报。This host-level alert is triggered if the Zeppelin server cannot be determined to be up and responding to client requests.
HiveServer2 交互进程HiveServer2 Interactive Process 如果无法确定 HiveServerInteractive 是否正在运行并在响应客户端请求,则会触发此主机级别的警报。This host-level alert is triggered if the HiveServerInteractive cannot be determined to be up and responding to client requests.
LLAP 应用程序LLAP Application 如果无法确定 LLAP 应用程序是否正在运行并在响应请求,则会触发此警报。This alert is triggered if the LLAP Application cannot be determined to be up and responding to requests.

可以选择每个服务来查看其详细信息。You can select each service to view more information on it.

尽管服务页提供了有关每个服务的状态和配置的信息,但并不提供有关该服务正在哪个头节点上运行的信息。While the service page provides information on the status and configuration of each service, it does not provide information on which head node the service is running on. 若要查看此信息,请使用页面顶部的“主机”链接。To view this information, use the Hosts link at the top of the page. 此页会显示群集内的主机,包括头节点。This page displays hosts within the cluster, including the head nodes.

主机列表

选择一个头节点的链接会显示该节点上运行的服务与组件。Selecting the link for one of the head nodes displays the services and components running on that node.

组件状态

Ambari REST APIAmbari REST API

Ambari REST API 可以通过 Internet 使用。The Ambari REST API is available over the internet. HDInsight 公共网关处理以当前托管着 REST API 的头节点为目的地的路由请求。The HDInsight public gateway handles routing requests to the head node that is currently hosting the REST API.

可以通过 Ambari REST API 使用以下命令来检查服务的状态:You can use the following command to check the state of a service through the Ambari REST API:

curl -u admin:PASSWORD https://CLUSTERNAME.azurehdinsight.cn/api/v1/clusters/CLUSTERNAME/services/SERVICENAME?fields=ServiceInfo/state
  • PASSWORD 替换为 HTTP 用户 (admin) 帐户密码。Replace PASSWORD with the HTTP user (admin) account password.
  • 将 CLUSTERNAME 替换为群集的名称。Replace CLUSTERNAME with the name of the cluster.
  • SERVICENAME 替换为要检查其状态的服务的名称。Replace SERVICENAME with the name of the service you want to check the status of.

例如,若要检查名为 mycluster 的群集上的、密码为 passwordHDFS 服务的状态,可使用以下命令:For example, to check the status of the HDFS service on a cluster named mycluster, with a password of password, you would use the following command:

curl -u admin:password https://mycluster.azurehdinsight.cn/api/v1/clusters/mycluster/services/HDFS?fields=ServiceInfo/state

响应类似于以下 JSON:The response is similar to the following JSON:

{
  "href" : "http://hn0-CLUSTERNAME.randomcharacters.cx.internal.chinacloudapp.cn:8080/api/v1/clusters/mycluster/services/HDFS?fields=ServiceInfo/state",
  "ServiceInfo" : {
    "cluster_name" : "mycluster",
    "service_name" : "HDFS",
    "state" : "STARTED"
  }
}

该 URL 表示,服务当前在名为 hn0-CLUSTERNAME 的头节点上运行。The URL tells us that the service is currently running on a head node named hn0-CLUSTERNAME.

该状态表示,此服务目前正在运行,或“已启动”。The state tells us that the service is currently running, or STARTED.

如果不知道有哪些服务安装在该群集上,可以使用以下命令检索列表:If you do not know what services are installed on the cluster, you can use the following command to retrieve a list:

curl -u admin:PASSWORD https://CLUSTERNAME.azurehdinsight.cn/api/v1/clusters/CLUSTERNAME/services

若要详细了解如何使用 Ambari REST API,请参阅使用 Apache Ambari REST API 监视和管理 HDInsightFor more information on working with the Ambari REST API, see Monitor and Manage HDInsight using the Apache Ambari REST API.

服务组件Service components

服务可能包含想要单独检查其状态的组件。Services may contain components that you wish to check the status of individually. 例如,HDFS 包含 NameNode 组件。For example, HDFS contains the NameNode component. 若要查看有关组件的信息,请使用以下命令:To view information on a component, the command would be:

curl -u admin:PASSWORD https://CLUSTERNAME.azurehdinsight.cn/api/v1/clusters/CLUSTERNAME/services/SERVICE/components/component

如果不知道服务提供了哪些组件,可以使用以下命令检索列表:If you do not know what components are provided by a service, you can use the following command to retrieve a list:

curl -u admin:PASSWORD https://CLUSTERNAME.azurehdinsight.cn/api/v1/clusters/CLUSTERNAME/services/SERVICE/components/component

如何访问头节点上的日志文件How to access log files on the head nodes

SSHSSH

通过 SSH 连接到头节点时,可以在 /var/log中找到日志文件。While connected to a head node through SSH, log files can be found under /var/log. 例如,/var/log/hadoop-yarn/yarn 包含 YARN 的日志。For example, /var/log/hadoop-yarn/yarn contain logs for YARN.

每个头节点可能具有唯一的日志条目,因此应该检查两个头节点上的日志。Each head node can have unique log entries, so you should check the logs on both.

SFTPSFTP

也可以使用 SSH 文件传输协议或安全文件传输协议 (SFTP) 连接到头节点并直接下载日志。You can also connect to the head node using the SSH File Transfer Protocol or Secure File Transfer Protocol (SFTP), and download the log files directly.

与使用 SSH 客户端一样,在连接到群集时,必须提供 SSH 用户帐户名和群集的 SSH 地址。Similar to using an SSH client, when connecting to the cluster you must provide the SSH user account name and the SSH address of the cluster. 例如,sftp username@mycluster-ssh.azurehdinsight.cnFor example, sftp username@mycluster-ssh.azurehdinsight.cn. 在出现提示时,提供帐户密码或使用 -i 参数提供公钥。Provide the password for the account when prompted, or provide a public key using the -i parameter.

建立连接后,会出现 sftp> 提示符。Once connected, you are presented with a sftp> prompt. 在此提示符下,可以更改目录以及上传和下载文件。From this prompt, you can change directories, upload, and download files. 例如:以下命令将目录切换到 /var/log/hadoop/hdfs 目录,并下载该目录中的所有文件。For example, the following commands change directories to the /var/log/hadoop/hdfs directory and then download all files in the directory.

cd /var/log/hadoop/hdfs
get *

有关可用命令的列表,请在 sftp> 提示符下输入 helpFor a list of available commands, enter help at the sftp> prompt.

Note

使用 SFTP 连接时,还会出现一个图形界面用于可视化文件系统。There are also graphical interfaces that allow you to visualize the file system when connected using SFTP. 例如,通过 MobaXTerm 可以使用类似于 Windows 资源管理器的界面浏览文件系统。For example, MobaXTerm allows you to browse the file system using an interface similar to Windows Explorer.

AmbariAmbari

Note

若要通过 Ambari 访问日志文件,必须使用 SSH 隧道。To access log files using Ambari, you must use an SSH tunnel. 单个服务的 Web 界面不在 Internet 上公开。The web interfaces for the individual services are not exposed publicly on the Internet. 有关使用 SSH 隧道的信息,请参阅使用 SSH 隧道文档。For information on using an SSH tunnel, see the Use SSH Tunneling document.

在 Ambari Web UI 中选择要查看其日志的服务(例如 YARN)。From the Ambari Web UI, select the service you wish to view logs for (for example, YARN). 然后使用“快速链接”选择要查看其日志的头节点。Then use Quick Links to select which head node to view the logs for.

使用快速链接查看日志

如何配置节点大小How to configure the node size

只能在创建群集期间选择节点大小。The size of a node can only be selected during cluster creation. 可以在 HDInsight 定价页上找到 HDInsight 可用的不同 VM 大小的列表。You can find a list of the different VM sizes available for HDInsight on the HDInsight pricing page.

创建群集时,可以指定节点的大小。When creating a cluster, you can specify the size of the nodes. 以下信息介绍了如何使用 Azure 门户Azure PowerShell 模块 AzAzure CLI 指定大小:The following information provides guidance on how to specify the size using the Azure portal, Azure PowerShell module Az, and the Azure CLI:

  • Azure 门户:创建群集时,可以设置群集所用节点的大小:Azure portal: When creating a cluster, you can set the size of the nodes used by the cluster:

    群集创建向导的图像,其中包含节点大小选项

  • Azure CLI:使用 az hdinsight create 命令时,可以使用 --headnode-size--workernode-size--zookeepernode-size 参数设置头节点、辅助角色节点与 ZooKeeper 节点的大小。Azure CLI: When using the az hdinsight create command, you can set the size of the head, worker, and ZooKeeper nodes by using the --headnode-size, --workernode-size, and --zookeepernode-size parameters.

  • Azure PowerShell:使用 New-AzHDInsightCluster cmdlet 时,可以使用 -HeadNodeSize-WorkerNodeSize-ZookeeperNodeSize 参数设置头节点、辅助角色节点与 ZooKeeper 节点的大小。Azure PowerShell: When using the New-AzHDInsightCluster cmdlet, you can set the size of the head, worker, and ZooKeeper nodes by using the -HeadNodeSize, -WorkerNodeSize, and -ZookeeperNodeSize parameters.

后续步骤Next steps

请使用以下链接深入了解本文档中所述的内容。Use the following links to learn more about things mentioned in this document.