在基于 Linux 的 HDInsight 上为 Apache Hadoop 服务启用堆转储Enable heap dumps for Apache Hadoop services on Linux-based HDInsight

堆转储包含应用程序的内存快照,其中包括创建转储时各变量的值。Heap dumps contain a snapshot of the application's memory, including the values of variables at the time the dump was created. 因此,它们在诊断发生在运行时的问题时很有用。So they are useful for diagnosing problems that occur at run-time.

服务Services

可以启用以下服务的堆转储:You can enable heap dumps for the following services:

  • Apache hcatalog - tempeltonApache hcatalog - tempelton
  • Apache hive - hiveserver2、metastore、derbyserverApache hive - hiveserver2, metastore, derbyserver
  • mapreduce - jobhistoryservermapreduce - jobhistoryserver
  • Apache yarn - resourcemanager、nodemanager、timelineserverApache yarn - resourcemanager, nodemanager, timelineserver
  • Apache hdfs - datanode、secondarynamenode、namenodeApache hdfs - datanode, secondarynamenode, namenode

还可以启用映射的堆转储,并减少由 HDInsight 运行的流程数。You can also enable heap dumps for the map and reduce processes ran by HDInsight.

了解堆转储配置Understanding heap dump configuration

在某项服务启动时,可以通过将选项(有时称为 opts 或参数)传递到 JVM 来启用堆转储。Heap dumps are enabled by passing options (sometimes known as opts, or parameters) to the JVM when a service is started. 对于大多数 Apache Hadoop 服务,可以修改用于启动该服务的 shell 脚本来传递这些选项。For most Apache Hadoop services, you can modify the shell script used to start the service to pass these options.

在每个脚本中,有一个针对 *_OPTS 的导出,其中包含传递到 JVM 的选项。In each script, there is an export for *_OPTS, which contains the options passed to the JVM. 例如,在 hadoop-env.sh 脚本中,以 export HADOOP_NAMENODE_OPTS= 开头的行包含用于 NameNode 服务的选项。For example, in the hadoop-env.sh script, the line that begins with export HADOOP_NAMENODE_OPTS= contains the options for the NameNode service.

映射和化简进程稍有不同,因为这些操作是 MapReduce 服务的子进程。Map and reduce processes are slightly different, as these operations are a child process of the MapReduce service. 每个映射或化简进程都在子容器中运行,并且有两个包含 JVM 选项的条目。Each map or reduce process runs in a child container, and there are two entries that contain the JVM options. 二者均包含在 mapred-site.xml 中:Both contained in mapred-site.xml:

  • mapreduce.admin.map.child.java.optsmapreduce.admin.map.child.java.opts
  • mapreduce.admin.reduce.child.java.optsmapreduce.admin.reduce.child.java.opts

Note

我们建议使用 Apache Ambari 来修改脚本和 mapred-site.xml 设置,因为 Ambari 负责在群集中跨节点复制更改。We recommend using Apache Ambari to modify both the scripts and mapred-site.xml settings, as Ambari handle replicating changes across nodes in the cluster. 请参阅使用 Apache Ambari 部分以了解具体的步骤。See the Using Apache Ambari section for specific steps.

启用堆转储Enable heap dumps

发生 OutOfMemoryError 时,可以使用以下选项来启用堆转储:The following option enables heap dumps when an OutOfMemoryError occurs:

-XX:+HeapDumpOnOutOfMemoryError

+ 指示是否启用了此选项。The + indicates that this option is enabled. 默认为禁用。The default is disabled.

Warning

默认情况下,在 HDInsight 上不为 Hadoop 服务启用堆转储,因为转储文件可能很大。Heap dumps are not enabled for Hadoop services on HDInsight by default, as the dump files can be large. 如果启用了堆转储来进行故障诊断,请记住在重现问题并收集转储文件后禁用堆转储。If you do enable them for troubleshooting, remember to disable them once you have reproduced the problem and gathered the dump files.

转储位置Dump location

转储文件的默认位置是当前的工作目录。The default location for the dump file is the current working directory. 可以使用以下选项来控制文件的存储位置:You can control where the file is stored using the following option:

-XX:HeapDumpPath=/path

例如,使用 -XX:HeapDumpPath=/tmp 可以将转储存储在 /tmp 目录中。For example, using -XX:HeapDumpPath=/tmp causes the dumps to be stored in the /tmp directory.

脚本Scripts

还可以在发生 OutOfMemoryError 时触发一个脚本。You can also trigger a script when an OutOfMemoryError occurs. 例如,可以触发一个通知,这样你就知道发生了错误。For example, triggering a notification so you know that the error has occurred. 使用以下选项在 OutOfMemoryError 上触发某个脚本:Use the following option to trigger a script on an OutOfMemoryError:

-XX:OnOutOfMemoryError=/path/to/script

Note

由于 Apache Hadoop 是分布式系统,任何使用的脚本都必须放置在服务运行时所在的群集的所有节点上。Since Apache Hadoop is a distributed system, any script used must be placed on all nodes in the cluster that the service runs on.

该脚本还必须位于可供帐户(服务以该帐户的身份运行)访问的位置,并且必须提供执行权限。The script must also be in a location that is accessible by the account the service runs as, and must provide execute permissions. 例如,你可能希望将脚本存储在 /usr/local/bin 中,并通过 chmod go+rx /usr/local/bin/filename.sh 来授予读取和执行权限。For example, you may wish to store scripts in /usr/local/bin and use chmod go+rx /usr/local/bin/filename.sh to grant read and execute permissions.

使用 Apache AmbariUsing Apache Ambari

若要修改服务配置,请使用以下步骤:To modify the configuration for a service, use the following steps:

  1. 打开群集的 Ambari Web UI。Open the Ambari web UI for your cluster. 该 URL 为 https://YOURCLUSTERNAME.azurehdinsight.cnThe URL is https://YOURCLUSTERNAME.azurehdinsight.cn.

    出现提示时,在该站点中使用群集的 HTTP 帐户名(默认为 admin)和密码进行身份验证。When prompted, authenticate to the site using the HTTP account name (default: admin) and password for your cluster.

    Note

    Ambari 可能会再次提示输入用户名和密码。You may be prompted a second time by Ambari for the user name and password. 如果是这样,请输入相同的帐户名和密码。If so, enter the same account name and password.

  2. 使用左侧的列表,选择你想要修改的服务区。Using the list of on the left, select the service area you want to modify. 例如, HDFSFor example, HDFS. 在中心区域,选择 “配置” 选项卡。In the center area, select the Configs tab.

    “HDFS 配置”选项卡已选定的 Ambari 网站的图像

  3. 使用“筛选...” 条目,输入“opts” 。Using the Filter... entry, enter opts. 仅显示包含此文本的项。Only items containing this text are displayed.

    筛选的列表

  4. 查找需为其启用堆转储的服务的 *_OPTS 条目,并添加希望启用的选项。Find the *_OPTS entry for the service you want to enable heap dumps for, and add the options you wish to enable. 在下图中,已将 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/ 添加到 HADOOP_NAMENODE_OPTS 条目:In the following image, I've added -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/ to the HADOOP_NAMENODE_OPTS entry:

    HADOOP_NAMENODE_OPTS with -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/

    Note

    为映射或化简子进程启用堆转储时,需查找名为 mapreduce.admin.map.child.java.optsmapreduce.admin.reduce.child.java.opts 的字段。When enabling heap dumps for the map or reduce child process, look for the fields named mapreduce.admin.map.child.java.opts and mapreduce.admin.reduce.child.java.opts.

    使用“保存” 按钮保存所做的更改。Use the Save button to save the changes. 可以输入简短的说明,描述所做的更改。You can enter a short note describing the changes.

  5. 一旦应用了所做的更改,“需要重启” 图标会显示在一个或多个服务旁边。Once the changes have been applied, the Restart required icon appears beside one or more services.

    需要重新启动图标和重新启动按钮

  6. 选择需要重启的每个服务,并使用“服务操作” 按钮以“打开维护模式” 。Select each service that needs a restart, and use the Service Actions button to Turn On Maintenance Mode. 维护模式可以防止重启服务时从该服务生成警报。Maintenance mode prevents alerts from being generated from the service when you restart it.

    打开维护模式菜单

  7. 一旦启用维护模式,使用服务的“重启” 按钮即可“重启所有受影响的服务” Once you have enabled maintenance mode, use the Restart button for the service to Restart All Effected

    重新启动受影响的所有条目

    Note

    其他服务的“重启” 按钮条目可能会有所不同。the entries for the Restart button may be different for other services.

  8. 一旦重启服务,可使用“服务操作” 按钮“关闭维护模式” 。Once the services have been restarted, use the Service Actions button to Turn Off Maintenance Mode. 这样一来,Ambari 就可以继续监视服务的警报。This Ambari to resume monitoring for alerts for the service.