Azure HDInsight 中的 Interactive Query 群集大小调整指南Interactive Query cluster sizing guide in Azure HDInsight

本文档介绍如何调整典型工作负荷的 HDInsight Interactive Query 群集 (LLAP) 大小,以实现合理的性能。This document describes the sizing of the HDInsight Interactive Query cluster (LLAP) for a typical workload to achieve reasonable performance. 本文档中提供的建议是通用准则,特定的工作负荷可能需要进行特定的调整。The recommendations provided in this document are generic and specific workloads may need specific tuning.

Interactive Query 的默认 VM 类型Default VM types for Interactive Query

节点类型Node Type 实例Instance 大小Size
HeadHead D13 v2D13 v2 8 个 VCPU,56-GB RAM,400 GB SSD8 VCPUS, 56-GB RAM, 400 GB SSD
辅助角色Worker D14 v2D14 v2 16 个 VCPU,112-GB RAM,800 GB SSD16 VCPUS, 112-GB RAM, 800 GB SSD
ZooKeeperZooKeeper A4 v2A4 v2 4 个 VCPU,8-GB RAM,40 GB SSD4 VCPUS, 8-GB RAM, 40 GB SSD

建议的配置值基于 D14 v2 类型的工作器节点。The recommended configurations values are based on the D14 v2 type worker node.

密钥Key Value 说明Description
yarn.nodemanager.resource.memory-mbyarn.nodemanager.resource.memory-mb 102400 (MB)102400 (MB) 为节点上的所有 YARN 容器提供的总内存,以 MB 为单位。Total memory given, in MB, for all YARN containers on a node.
yarn.scheduler.maximum-allocation-mbyarn.scheduler.maximum-allocation-mb 102400 (MB)102400 (MB) RM 上每个容器请求的最大分配 (MB)。The maximum allocation for every container request at the RM, in MBs. 高于此值的内存请求不会生效。Memory requests higher than this value won't take effect.
yarn.scheduler.maximum-allocation-vcoresyarn.scheduler.maximum-allocation-vcores 1212 资源管理器中每个容器请求的最大 CPU 核心数。The maximum number of CPU cores for every container request at the Resource Manager. 高于此值的请求不会生效。Requests higher than this value won't take effect.
yarn.scheduler.capacity.root.llap.capacityyarn.scheduler.capacity.root.llap.capacity 90%90% LLAP 队列的 YARN 容量分配。YARN capacity allocation for LLAP queue.
hive.server2.tez.sessions.per.default.queuehive.server2.tez.sessions.per.default.queue number_of_worker_nodesnumber_of_worker_nodes hive.server2.tez.default.queues 中命名的每个队列的会话数。The number of sessions for each queue named in the hive.server2.tez.default.queues. 此数字对应于查询协调程序 (Tez AM) 的数量。This number corresponds to number of query coordinators(Tez AMs).
tez.am.resource.memory.mbtez.am.resource.memory.mb 4096 (MB)4096 (MB) tez AppMaster 使用的内存量 (MB)。The amount of memory in MB to be used by the tez AppMaster.
hive.tez.container.sizehive.tez.container.size 4096 (MB)4096 (MB) 指定的 Tez 容器大小 (MB)。Specified Tez container size in MB.
hive.llap.daemon.num.executorshive.llap.daemon.num.executors 1212 LLAP 守护程序的执行程序数。Number of executors per LLAP daemon.
hive.llap.io.threadpool.sizehive.llap.io.threadpool.size 1212 执行程序的线程池大小。Thread pool size for executors.
hive.llap.daemon.yarn.container.mbhive.llap.daemon.yarn.container.mb 86016 (MB)86016 (MB) 单个 LLAP 守护程序使用的总内存(每个守护程序的内存),以 MB 为单位。Total memory in MB used by individual LLAP daemons (Memory per daemon).
hive.llap.io.memory.sizehive.llap.io.memory.size 409600 (MB)409600 (MB) 在启用了 SSD 缓存的情况下每个 LLAP 守护程序的缓存大小 (MB)。Cache size in MB per LLAP daemon provided SSD cache is enabled.
hive.auto.convert.join.noconditionaltask.sizehive.auto.convert.join.noconditionaltask.size 2048 (MB)2048 (MB) 用于执行 Map Join 的内存大小,以 MB 为单位。memory size in MB to do Map Join.

LLAP 守护程序大小估算LLAP daemon size estimations

yarn.nodemanager.resource.memory-mbyarn.nodemanager.resource.memory-mb

此值表示每个节点上的 YARN 容器使用的最大内存总和,以 MB 为单位。This value indicates a maximum sum of memory in MB used by the YARN containers on each node. 它指定 YARN 可在此节点上使用的内存量,因此,此值应小于该节点上的内存总量。It specifies the amount of memory YARN can use on this node, so this value should be lesser than the total memory on that node.

此值 = [节点上的物理内存总量] – [OS 和其他服务的内存]。Set this value = [Total physical memory on node] – [ memory for OS + Other services ].

建议将此值设置为可用 RAM 的大约 90%。It's recommended to set this value to ~90% of the available RAM. 对于 D14 v2,建议的值为 102400 MBFor D14 v2, the recommended value is 102400 MB.

yarn.scheduler.maximum-allocation-mbyarn.scheduler.maximum-allocation-mb

此值表示资源管理器中每个容器请求的最大分配 (MB)。This value indicates the maximum allocation for every container request at the Resource Manager, in MB. 高于此指定值的内存请求不会生效。Memory requests higher than the specified value won't take effect. 资源管理器只能以 yarn.scheduler.minimum-allocation-mb 为增量为容器分配内存,它不能超过 yarn.scheduler.maximum-allocation-mb 指定的大小。The Resource Manager can only give memory to containers in increments of yarn.scheduler.minimum-allocation-mb and can't exceed the size specified by yarn.scheduler.maximum-allocation-mb. 此值不应超过节点的给定内存总量(由 yarn.nodemanager.resource.memory-mb 指定)。This value shouldn't be more than the total given memory of the node, which is specified by yarn.nodemanager.resource.memory-mb.

对于 D14 v2 工作器节点,建议的值为 102400 MBFor D14 v2 worker nodes, the recommended value is 102400 MB

yarn.scheduler.maximum-allocation-vcoresyarn.scheduler.maximum-allocation-vcores

此配置表示资源管理器中每个容器请求的最大虚拟 CPU 核心数。This configuration indicates the maximum number of virtual CPU cores for every container request at the Resource Manager. 如果请求的值高于此配置的值,则请求不会生效。Requesting a higher value than this configuration won't take effect. 此配置是 YARN 计划程序的全局属性。This configuration is a global property of the YARN scheduler. 对于 LLAP 守护程序容器,此值可以设置为可用虚拟核心 (VCORE) 总数的 75%。For LLAP daemon container, this value can be set to 75% of total available virtual cores (VCORES). 剩余的 25% 应保留给在工作器节点上运行的 NodeManager、DataNode 和其他服务。The remaining 25% should be reserved for NodeManager, DataNode, and other services running on the worker nodes.

D14 v2 工作器节点有 16 个 VCORE,可以分配 16 个 VCORE 中的 75%。For D14 v2 worker nodes, there are 16 VCORES and 75% of 16 VCORES can be given. 因此,对于 LLAP 守护程序容器,建议的值是 12So the recommended value for LLAP daemon container is 12.

hive.server2.tez.sessions.per.default.queuehive.server2.tez.sessions.per.default.queue

此配置值决定了应该针对 hive.server2.tez.default.queues 所指定的每个队列并行启动的 Tez 会话数。This configuration value determines the number of Tez sessions that should be launched in parallel for each of the queues specified by hive.server2.tez.default.queues. 该值对应于 Tez AM(查询协调程序)的数量。The value corresponds to the number of Tez AMs (Query Coordinators). 建议使用与工作器节点数目相同的值,使每个节点有一个 Tez AM。It's recommended to be the same as the number of worker nodes to have one Tez AM per node. Tez AM 数量可以大于 LLAP 守护程序节点的数量。The number of Tez AMs can be higher than the number of LLAP daemon nodes. Tez AM 的主要责任是协调查询执行,并将查询计划片段分配给相应的 LLAP 守护程序来执行。Their primary responsibility is to coordinate the query execution and assign query plan fragments to corresponding LLAP daemons for execution. 建议将其数量保留为 LLAP 守护程序节点数量的倍数,以实现更高的吞吐量。It's recommended to keep it as multiple of a number of LLAP daemon nodes to achieve higher throughput.

默认 HDInsight 群集中有四个 LLAP 守护程序在四个工作器节点上运行,因此建议的值为 4Default HDInsight cluster has four LLAP daemons running on four worker nodes, so the recommended value is 4.

tez.am.resource.memory.mb、hive.tez.container.sizetez.am.resource.memory.mb, hive.tez.container.size

tez.am.resource.memory.mb 定义 Tez 应用程序主机大小。tez.am.resource.memory.mb defines the Tez Application Master size.
建议的值为 4096 MBThe recommended value is 4096 MB.

hive.tez.container.size 定义分配给 Tez 容器的内存量。hive.tez.container.size defines the amount of memory given for Tez container. 此值必须设置为 YARN 最小容器大小 (yarn.scheduler.minimum-allocation-mb) 与 YARN 最大容器大小 (yarn.scheduler.maximum-allocation-mb) 之间。This value must be set between the YARN minimum container size(yarn.scheduler.minimum-allocation-mb) and the YARN maximum container size(yarn.scheduler.maximum-allocation-mb).
建议将其设置为 4096 MBIt's recommended to be set to 4096 MB.

考虑到每个容器有一个处理器,通常会使此值小于每个处理器的内存量。A general rule is to keep it lesser than the amount of memory per processor considering one processor per container. 在为 LLAP 守护程序分配内存之前,根据节点上的 Tez AM 数量 Rreserve 内存。Rreserve memory for number of Tez AMs on a node before giving the memory for LLAP daemon. 例如,如果你为每个节点使用两个 Tez AM(各有 4 GB 内存),请为 LLAP 守护程序分配 90 GB 中的 82 GB,并将剩余的 8 GB 保留给两个 Tez AM。For instance, if you're using two Tez AMs(4 GB each) per node, give 82 GB out of 90 GB for LLAP daemon reserving 8 GB for two Tez AMs.

yarn.scheduler.capacity.root.llap.capacityyarn.scheduler.capacity.root.llap.capacity

此值表示分配给 LLAP 队列的容量百分比。This value indicates a percentage of capacity given for LLAP queue. HDInsights Interactive Query 群集为 LLAP 队列分配总容量的 90%,剩余的 10% 设置给默认队列以进行其他容器分配。The HDInsights Interactive query cluster gives 90% of the total capacity for LLAP queue and the remaining 10% is set to default queue for other container allocations.
对于 D14v2 工作器节点,对 LLAP 队列建议的值为 90For D14v2 worker nodes, the recommended value is 90 for LLAP queue.

hive.llap.daemon.yarn.container.mbhive.llap.daemon.yarn.container.mb

LLAP 守护程序的总内存大小取决于以下组件:The total memory size for LLAP daemon depends on following components:

  • YARN 容器大小配置(yarn.scheduler.maximum-allocation-mbyarn.scheduler.maximum-allocation-mbyarn.nodemanager.resource.memory-mbConfiguration of YARN container size (yarn.scheduler.maximum-allocation-mb, yarn.scheduler.maximum-allocation-mb, yarn.nodemanager.resource.memory-mb)

  • 执行程序使用的堆内存 (Xmx)Heap memory used by executors (Xmx)

    除去余留空间大小后,该守护程序的可用 RAM 量。Its amount of RAM available after taking out headroom size.
    对于 D14 v2,HDI 4.0 - 此值为 (86 GB - 6 GB) = 80 GBFor D14 v2, HDI 4.0 - this value is (86 GB - 6 GB) = 80 GB
    对于 D14 v2,HDI 3.6 - 此值为 (84 GB - 6 GB) = 78 GBFor D14 v2, HDI 3.6 - this value is (84 GB - 6 GB) = 78 GB

  • 每个守护程序的堆外内存中缓存 (hive.llap.io.memory.size)Off-heap in-memory cache per daemon (hive.llap.io.memory.size)

  • 余留空间Headroom

    有一部分堆外内存用于 Java VM 开销(元空间、线程堆栈、GC 数据结构等)。It's a portion of off-heap memory used for Java VM overhead (metaspace, threads stack, gc data structures, and so on). 我们观察到,此内存部分大约为堆大小 (Xmx) 的 6%。This portion is observed to be around 6% of the heap size (Xmx). 为了安全起见,此值可计算为总 LLAP 守护程序内存大小的 6%。To be on the safer side, it can be calculated as 6% of total LLAP daemon memory size. 由于在启用 SSD 缓存后可以做到这一点,因此允许 LLAP 守护程序使用仅供堆使用的所有可用内存中空间。Because it's possible when SSD cache is enabled, it will allow LLAP daemon to use all available in-memory space to be used only for heap.
    对于 D14 v2,建议的值为 ceil(86 GB x 0.06) ~= 6 GBFor D14 v2, the recommended value is ceil(86 GB x 0.06) ~= 6 GB.

每个守护程序的内存 = [内存中缓存大小] + [堆大小] + [余留空间]。Memory per daemon = [In-memory cache size] + [Heap size] + [Headroom].

计算公式如下:It can be calculated as follows:

每个节点的 Tez AM 内存 = [(Tez AM 数量/LLAP 守护程序节点数量) * Tez AM 大小]。Tez AM memory per node = [ (Number of Tez AMs/Number of LLAP daemon nodes) * Tez AM size ]. LLAP 守护程序容器大小 = [YARN 最大容器内存的 90%] – [每个节点的 Tez AM 内存]。LLAP daemon container size = [ 90% of YARN max container memory ] – [ Tez AM memory per node ].

对于 D14 v2 工作器节点,HDI 4.0 - 建议的值为 (90 - (1/1 * 4 GB)) = 86 GBFor D14 v2 worker node, HDI 4.0 - the recommended value is (90 - (1/1 * 4 GB)) = 86 GB. (对于 HDI 3.6,建议的值为 84 GB,因为应该为滑块 AM 保留大约 2 GB。)(For HDI 3.6, recommended value is 84 GB because you should reserve ~2 GB for slider AM.)

hive.llap.io.memory.sizehive.llap.io.memory.size

此配置是可用作 LLAP 守护程序缓存的内存量。This configuration is the amount of memory available as cache for LLAP daemon. LLAP 守护程序可以使用 SSD 作为缓存。The LLAP daemons can use SSD as a cache. hive.llap.io.allocator.mmap 设置为 true 会启用 SSD 缓存。Setting hive.llap.io.allocator.mmap = true will enable SSD caching. D14 v2 随附大约 800 GB 的 SSD,默认已为 Interactive Query 群集 (LLAP) 启用了 SSD 缓存。The D14 v2 comes with ~800 GB of SSD and the SSD caching is enabled by default for interactive query Cluster (LLAP). LLAP 配置为使用 SSD 空间的 50% 作为堆外缓存。It's configured to use 50% of the SSD space for off-heap cache.

对于 D14 v2,建议的值为 409600 MBFor D14 v2, the recommended value is 409600 MB.

对于其他 VM,如果未启用 SSD 缓存,分配一部分可用 RAM 作为 LLAP 缓存将有利于提高性能。For other VMs, with no SSD caching enabled, it's beneficial to give portion of available RAM for LLAP caching to achieve better performance. 按如下所述调整 LLAP 守护程序的总内存大小:Adjust the total memory size for LLAP daemon as follows:

LLAP 守护程序总内存 = [LLAP 缓存大小] + [堆大小] + [余留空间]。Total LLAP daemon memory = [LLAP cache size] + [Heap size] + [Headroom].

建议调整缓存大小和堆大小,使其最适合你的工作负荷。It's recommended to adjust the cache size and the heap size that is best suitable for your workload.

hive.llap.daemon.num.executorshive.llap.daemon.num.executors

此配置控制每个 LLAP 守护程序可以并行执行任务的执行程序数。This configuration controls the number of executors that can execute tasks in parallel per LLAP daemon. 此值是可用 VCORE 数量、为每个执行程序分配的内存量,以及每个 LLAP 守护程序可用的内存总量相减后的余量。This value is a balance of number of available VCORES, the amount of memory given per executor, and total memory available per LLAP daemon. 通常,我们希望此值尽可能接近核心数。Usually, we would like this value to be as close as possible to the number of cores.

对于 D14 v2,有 16 个可用 VCORE,但不可以分配所有 VCORE。For D14 v2, there are 16 VCORES available, however not all of the VCORES can be given. 工作器节点还会运行其他服务,例如 NodeManager、DataNode 和指标监视器,这些服务需要一些可用的 VCORE。The worker nodes also run other services like NodeManager, DataNode, and Metrics Monitor, that needs some portion of available VCORES. 此值最大可配置为该节点上可用 VCORE 总数的 75%。This value can be configured up to 75% of the total VCORES available on that node.

对于 D14 v2,建议的值为 (.75 X 16) = 12For D14 v2, the recommended value is (.75 X 16) = 12

建议为每个执行程序保留大约 6 GB 的堆空间。It's recommended that you reserve ~6 GB of heap space per executor. 根据可用的 LLAP 守护程序大小和每个节点的可用 VCORE 数量调整执行程序数量。Adjust your number of executors based on available LLAP daemon size, and number of available VCORES per node.

hive.llap.io.threadpool.sizehive.llap.io.threadpool.size

此值指定执行程序的线程池大小。This value specifies the thread pool size for executors. 由于根据指定,执行程序数量是固定的,因此此值与每个 LLAP 守护程序的执行程序数量相同。Since executors are fixed as specified, it will be same as number of executors per LLAP daemon.

对于 D14 v2,建议将此值设置为 12For D14 v2, it's recommended to set this value to 12.

此配置不能超过 yarn.nodemanager.resource.cpu-vcores 值。This configuration can't exceed yarn.nodemanager.resource.cpu-vcores value.

hive.auto.convert.join.noconditionaltask.sizehive.auto.convert.join.noconditionaltask.size

请确保启用 hive.auto.convert.join.noconditionaltask,使此参数生效。Make sure you have hive.auto.convert.join.noconditionaltask enabled for this parameter to take effect. 此配置允许用户指定适合装入内存以执行 Map Join 的表的大小。This configuration allows the user to specify the size of the tables that can fit in memory to do Map join. 如果在 n 向联接中,n-1 个 tables/partitions 的大小之和小于配置的值,则会选择 Map Join。If the sum of the size of n-1 of the tables/partitions for n-way join is less than the configured value, the Map join will be chosen. 应使用 LLAP 执行程序内存大小来计算要自动转换为 Map Join 的阈值。The LLAP executor memory size should be used to calculate the threshold for autoconvert to Map Join.

对于 D14 v2,建议将此值设置为 2048 MBFor D14 v2, it's recommended to set this value to 2048 MB.

我们建议根据你的工作负荷调整此值,因为将此值设置得太小可能无法使用自动转换功能。We recommend adjusting this value that is suitable for your workload as setting this value too low may not use autoconvert feature. 将它设置得太大可能会导致 GC 暂停,从而对查询性能产生负面影响。Setting it too high may result into GC pauses, which can adversely affect query performance.

后续步骤Next steps