在基于 Linux 的 HDInsight 上访问 Apache Hadoop YARN 应用程序日志Access Apache Hadoop YARN application logs on Linux-based HDInsight

了解如何在 Azure HDInsight 中的 Apache Hadoop 群集上访问 Apache Hadoop YARN (Yet Another Resource Negotiator) 应用程序日志。Learn how to access the logs for Apache Hadoop YARN (Yet Another Resource Negotiator) applications on an Apache Hadoop cluster in Azure HDInsight.

什么是 Apache YARN?What is Apache YARN?

YARN 通过将资源管理与应用程序计划/监视相分离,来支持多种编程模型(Apache Hadoop MapReduce 就是其中之一)。YARN supports multiple programming models (Apache Hadoop MapReduce being one of them) by decoupling resource management from application scheduling/monitoring. YARN 使用全局 ResourceManager (RM)、按辅助角色节点 NodeManagers (NM) 和按应用程序 ApplicationMasters (AM)。YARN uses a global ResourceManager (RM), per-worker-node NodeManagers (NMs), and per-application ApplicationMasters (AMs). 按应用程序 AM 与 RM 协商用于运行应用程序的资源(CPU、内存、磁盘、网络)。The per-application AM negotiates resources (CPU, memory, disk, network) for running your application with the RM. RM 与 NM 合作来授予这些资源(以容器 的形式授予)。The RM works with NMs to grant these resources, which are granted as containers. AM 负责跟踪 RM 为其分配容器的进度。The AM is responsible for tracking the progress of the containers assigned to it by the RM. 根据应用程序性质,应用程序可能需要多个容器。An application may require many containers depending on the nature of the application.

每个应用程序可能包含多个 应用程序尝试Each application may consist of multiple application attempts. 如果应用程序失败,可能会重试进行新的尝试。If an application fails, it may be retried as a new attempt. 每次尝试都在容器中运行。Each attempt runs in a container. 在某种意义上,容器提供了由 YARN 应用程序执行的基本工作单位的上下文。In a sense, a container provides the context for basic unit of work performed by a YARN application. 在容器的上下文中完成的所有工作均在分配了容器的单个辅助角色节点上执行。All work that is done within the context of a container is performed on the single worker node on which the container was allocated. 请参阅 Hadoop:编写 YARN 应用程序Apache Hadoop YARN 以获取更多参考信息。See Hadoop: Writing YARN Applications, or Apache Hadoop YARN for further reference.

若要通过缩放群集来支持更高的处理吞吐量,可按照使用几种不同的语言手动缩放群集进行操作。To scale your cluster to support greater processing throughput, you can use Scale your clusters manually using a few different languages.

YARN Timeline ServerYARN Timeline Server

Apache Hadoop YARN Timeline Server 提供有关已完成应用程序的一般信息The Apache Hadoop YARN Timeline Server provides generic information on completed applications

YARN Timeline Server 包括以下类型的数据:YARN Timeline Server includes the following type of data:

  • 应用程序 ID(应用程序的唯一标识符)The application ID, a unique identifier of an application
  • 启动应用程序的用户The user who started the application
  • 尝试完成应用程序的相关信息Information on attempts made to complete the application
  • 任何给定应用程序尝试所用的容器The containers used by any given application attempt

YARN 应用程序和日志YARN applications and logs

YARN 通过将资源管理与应用程序计划/监视相分离,来支持多种编程模型(Apache Hadoop MapReduce 就是其中之一)。YARN supports multiple programming models (Apache Hadoop MapReduce being one of them) by decoupling resource management from application scheduling/monitoring. YARN 使用全局 ResourceManager (RM)、按辅助角色节点 NodeManagers (NM) 和按应用程序 ApplicationMasters (AM)。YARN uses a global ResourceManager (RM), per-worker-node NodeManagers (NMs), and per-application ApplicationMasters (AMs). 按应用程序 AM 与 RM 协商用于运行应用程序的资源(CPU、内存、磁盘、网络)。The per-application AM negotiates resources (CPU, memory, disk, network) for running your application with the RM. RM 与 NM 合作来授予这些资源(以容器 的形式授予)。The RM works with NMs to grant these resources, which are granted as containers. AM 负责跟踪 RM 为其分配容器的进度。The AM is responsible for tracking the progress of the containers assigned to it by the RM. 根据应用程序性质,应用程序可能需要多个容器。An application may require many containers depending on the nature of the application.

每个应用程序可能包含多个 应用程序尝试Each application may consist of multiple application attempts. 如果应用程序失败,可能会重试进行新的尝试。If an application fails, it may be retried as a new attempt. 每次尝试都在容器中运行。Each attempt runs in a container. 在某种意义上,容器提供了由 YARN 应用程序执行的基本工作单位的上下文。In a sense, a container provides the context for basic unit of work performed by a YARN application. 在容器的上下文中完成的所有工作均在分配了容器的单个辅助角色节点上执行。All work that is done within the context of a container is performed on the single worker node on which the container was allocated. 请参阅 Apache Hadoop YARN 的概念,以获取更多参考信息。See Apache Hadoop YARN Concepts for further reference.

应用程序日志(和关联的容器日志)在对有问题的 Hadoop 应用程序进行调试上相当重要。Application logs (and the associated container logs) are critical in debugging problematic Hadoop applications. YARN 提供一个良好的框架,用于使用日志聚合功能收集、聚合和存储应用程序日志。YARN provides a nice framework for collecting, aggregating, and storing application logs with the Log Aggregation feature. 日志聚合功能使访问应用程序日志更具确定性。The Log Aggregation feature makes accessing application logs more deterministic. 它聚合工作器节点上所有容器的日志,并将其存储为一个聚合日志文件(每个工作器节点)。It aggregates logs across all containers on a worker node and stores them as one aggregated log file per worker node. 应用程序完成后,日志存储在默认文件系统上。The log is stored on the default file system after an application finishes. 应用程序可能使用数百或数千个容器,但在单个工作器节点上运行的所有容器的日志始终聚合成单个文件。Your application may use hundreds or thousands of containers, but logs for all containers run on a single worker node are always aggregated to a single file. 因此,在每个辅助角色节点上,应用程序只使用 1 个日志。So there's only 1 log per worker node used by your application. 在 HDInsight 群集版本 3.0 和更高版本上,日志聚合默认已启用。Log Aggregation is enabled by default on HDInsight clusters version 3.0 and above. 聚合日志位于群集的默认存储中。Aggregated logs are located in default storage for the cluster. 下面的路径是日志的 HDFS 路径:The following path is the HDFS path to the logs:

/app-logs/<user>/logs/<applicationId>

在此路径中,user 是启动应用程序的用户的名称。In the path, user is the name of the user who started the application. applicationId 是 YARN RM 分配给应用程序的唯一标识符。The applicationId is the unique identifier assigned to an application by the YARN RM.

无法直接阅读聚合日志,因为它们是以 TFile(由容器编制索引的二进制格式)编写的。The aggregated logs aren't directly readable, as they're written in a TFile, binary format indexed by container. 使用 YARN ResourceManager 日志或 CLI 工具以纯文本的形式查看感兴趣的应用程序或容器的这些日志。Use the YARN ResourceManager logs or CLI tools to view these logs as plain text for applications or containers of interest.

YARN CLI 工具YARN CLI tools

  1. 使用 ssh 命令连接到群集。Use ssh command to connect to your cluster. 编辑以下命令(将 CLUSTERNAME 替换为群集的名称),然后输入该命令:Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command:

    ssh sshuser@CLUSTERNAME-ssh.azurehdinsight.cn
    
  2. 使用以下命令列出当前正在运行的 Yarn 应用程序的所有应用程序 ID:List all the application ids of the currently running Yarn applications with the following command:

    yarn top
    

    记下 APPLICATIONID 列中要下载其日志的应用程序 ID。Note the application id from the APPLICATIONID column whose logs are to be downloaded.

    YARN top - 18:00:07, up 19d, 0:14, 0 active users, queue(s): root
    NodeManager(s): 4 total, 4 active, 0 unhealthy, 0 decommissioned, 0 lost, 0 rebooted
    Queue(s) Applications: 2 running, 10 submitted, 0 pending, 8 completed, 0 killed, 0 failed
    Queue(s) Mem(GB): 97 available, 3 allocated, 0 pending, 0 reserved
    Queue(s) VCores: 58 available, 2 allocated, 0 pending, 0 reserved
    Queue(s) Containers: 2 allocated, 0 pending, 0 reserved
    
                      APPLICATIONID USER             TYPE      QUEUE   #CONT  #RCONT  VCORES RVCORES     MEM    RMEM  VCORESECS    MEMSECS %PROGR       TIME NAME
     application_1490377567345_0007 hive            spark  thriftsvr       1       0       1       0      1G      0G    1628407    2442611  10.00   18:20:20 Thrift JDBC/ODBC Server
     application_1490377567345_0006 hive            spark  thriftsvr       1       0       1       0      1G      0G    1628430    2442645  10.00   18:20:20 Thrift JDBC/ODBC Server
    
  3. 可通过运行下列命令之一以纯文本格式查看这些日志:You can view these logs as plain text by running one of the following commands:

    yarn logs -applicationId <applicationId> -appOwner <user-who-started-the-application>
    yarn logs -applicationId <applicationId> -appOwner <user-who-started-the-application> -containerId <containerId> -nodeAddress <worker-node-address>
    

运行这些命令时,请指定 <applicationId>、<user-who-started-the-application>、<containerId> 和 <worker-node-address> 信息。Specify the <applicationId>, <user-who-started-the-application>, <containerId>, and <worker-node-address> information when running these commands.

其他示例命令Other sample commands

  1. 使用以下命令下载所有应用程序主机的 Yarn 容器日志:Download Yarn containers logs for all application masters with the command below. 这会以文本格式创建名为 amlogs.txt 的日志文件。This will create the log file named amlogs.txt in text format.

    yarn logs -applicationId <application_id> -am ALL > amlogs.txt
    
  2. 使用以下命令只会下载最新应用程序主机的 Yarn 容器日志:Download Yarn container logs for only the latest application master with the following command:

    yarn logs -applicationId <application_id> -am -1 > latestamlogs.txt
    
  3. 使用以下命令下载前两个应用程序主机的 YARN 容器日志:Download YARN container logs for first two application masters with the following command:

    yarn logs -applicationId <application_id> -am 1,2 > first2amlogs.txt
    
  4. 使用以下命令下载所有 Yarn 容器日志:Download all Yarn container logs with the following command:

    yarn logs -applicationId <application_id> > logs.txt
    
  5. 使用以下命令下载特定容器的 Yarn 容器日志:Download yarn container log for a particular container with the following command:

    yarn logs -applicationId <application_id> -containerId <container_id> > containerlogs.txt
    

YARN ResourceManager UIYARN ResourceManager UI

YARN ResourceManager UI 在群集头节点上运行。The YARN ResourceManager UI runs on the cluster headnode. 可通过 Ambari web UI 访问它。It is accessed through the Ambari web UI. 使用以下步骤查看 YARN 日志:Use the following steps to view the YARN logs:

  1. 在 Web 浏览器中导航到 https://CLUSTERNAME.azurehdinsight.cnIn your web browser, navigate to https://CLUSTERNAME.azurehdinsight.cn. 将 CLUSTERNAME 替换为 HDInsight 群集的名称。Replace CLUSTERNAME with the name of your HDInsight cluster.

  2. 从左侧的服务列表中,选择“YARN” 。From the list of services on the left, select YARN.

    选中的 Yarn 服务

  3. 在“快速链接” 的下拉列表中,选择其中一个群集头节点,并选择“ResourceManager 日志” 。From the Quick Links dropdown, select one of the cluster head nodes and then select ResourceManager Log.

    Yarn 快速链接

    此时将显示 YARN 日志的链接列表。You are presented with a list of links to YARN logs.