在 Azure HDInsight 中使用 Apache Ambari 优化 Apache HBaseOptimize Apache HBase with Apache Ambari in Azure HDInsight

Apache Ambari 是用于管理和监视 HDInsight 群集的 Web 界面。Apache Ambari is a web interface to manage and monitor HDInsight clusters. 有关 Ambari Web UI 的简介,请参阅使用 Apache Ambari Web UI 管理 HDInsight 群集For an introduction to Ambari Web UI, see Manage HDInsight clusters by using the Apache Ambari Web UI.

可以通过“HBase 配置” 选项卡修改 Apache HBase 配置。以下部分介绍了一些影响 HBase 性能的重要配置设置。Apache HBase configuration is modified from the HBase Configs tab. The following sections describe some of the important configuration settings that affect HBase performance.

设置 HBASE_HEAPSIZESet HBASE_HEAPSIZE

HBase 堆大小指定区域服务器和主服务器要使用的最大堆数量(以 MB 为单位)。 The HBase heap size specifies the maximum amount of heap to be used in megabytes by region and master servers. 默认值为 1,000 MB。The default value is 1,000 MB. 应该为群集工作负荷优化此值。This value should be tuned for the cluster workload.

  1. 若要修改,请导航到 HBase“配置”选项卡中的“高级 HBase-env”窗格,然后找到 HBASE_HEAPSIZE 设置。 To modify, navigate to the Advanced HBase-env pane in the HBase Configs tab, and then find the HBASE_HEAPSIZE setting.

  2. 将默认值更改为 5,000 MB。Change the default value to 5,000 MB.

    “Apache Ambari HBase 内存堆大小”

优化读取密集型工作负荷Optimize read-heavy workloads

以下配置对于提高读取密集型工作负荷的性能非常重要。The following configurations are important to improve the performance of read-heavy workloads.

块缓存大小Block cache size

块缓存是读取缓存。The block cache is the read cache. 其大小由 hfile.block.cache.size 参数控制。Its size is controlled by the hfile.block.cache.size parameter. 默认值为 0.4,即总区域服务器内存的 40%。The default value is 0.4, which is 40 percent of the total region server memory. 块缓存大小越大,随机读取的速度越快。The larger the block cache size, the faster the random reads will be.

  1. 若要修改此参数,请导航到 HBase“配置”选项卡中的“设置”选项卡,然后找到“分配到读取缓冲区的 RegionServer 内存百分比”。 To modify this parameter, navigate to the Settings tab in the HBase Configs tab, and then locate % of RegionServer Allocated to Read Buffers.

    “Apache HBase 内存块缓存大小”

  2. 若要更改此值,请选择“编辑”图标。To change the value, select the Edit icon.

Memstore 大小Memstore size

所有编辑内容都存储在称作 Memstore 的内存缓冲区中。All edits are stored in the memory buffer, called a Memstore. 此缓冲区增加了可以在单个操作中写入磁盘的数据总量。This buffer increases the total amount of data that can be written to disk in a single operation. 它还加快了对最近编辑内容的访问速度。It also speeds access to the recent edits. Memstore 大小由以下两个参数定义:The Memstore size is defined by the following two parameters:

  • hbase.regionserver.global.memstore.UpperLimit:定义 Memstore 总共可以使用的最大区域服务器百分比。hbase.regionserver.global.memstore.UpperLimit: Defines the maximum percentage of the region server that Memstore combined can use.

  • hbase.regionserver.global.memstore.LowerLimit:定义 Memstore 总共可以使用的最小区域服务器百分比。hbase.regionserver.global.memstore.LowerLimit: Defines the minimum percentage of the region server that Memstore combined can use.

若要优化随机读取,可以减小 Memstore 的上限和下限。To optimize for random reads, you can reduce the Memstore upper and lower limits.

从磁盘扫描时提取的行数Number of rows fetched when scanning from disk

hbase.client.scanner.caching 设置定义在扫描程序中调用 next 方法时,要从磁盘读取的行数。The hbase.client.scanner.caching setting defines the number of rows read from disk when the next method is called on a scanner. 默认值为 100。The default value is 100. 该数字越大,从客户端向区域服务器发出的远程调用数就越少,因而扫描速度也就越快。The higher the number, the fewer the remote calls made from the client to the region server, resulting in faster scans. 但是,此设置也会增加客户端上的内存压力。However, this setting will also increase memory pressure on the client.

Apache HBase 提取的行数

重要

设置此值时,请不要使扫描程序中的下一次方法调用间隔时间大于扫描程序的超时时间。Do not set the value such that the time between invocation of the next method on a scanner is greater than the scanner timeout. 扫描程序超时期限由 hbase.regionserver.lease.period 属性定义。The scanner timeout duration is defined by the hbase.regionserver.lease.period property.

优化写入密集型工作负荷Optimize write-heavy workloads

以下配置对于提高写入密集型工作负荷的性能非常重要。The following configurations are important to improve the performance of write-heavy workloads.

最大区域文件大小Maximum region file size

HBase 使用称作 HFile 的内部文件格式存储数据。HBase stores data in an internal file format, called HFile. 属性 hbase.hregion.max.filesize 定义区域的单个 HFile 的大小。The property hbase.hregion.max.filesize defines the size of a single HFile for a region. 如果区域中的 HFiles 总数大于此设置,则会将该区域拆分为两个区域。A region is split into two regions if the sum of all HFiles in a region is greater than this setting.

“Apache HBase HRegion 最大文件大小”

区域文件大小越大,拆分数目越小。The larger the region file size, the smaller the number of splits. 可以增大文件大小,以确定可以最大程度地提高写入性能的值。You can increase the file size to determine a value that results in the maximum write performance.

避免阻止更新Avoid update blocking

  • 属性 hbase.hregion.memstore.flush.size 定义 Memstore 刷新到磁盘的增量大小。The property hbase.hregion.memstore.flush.size defines the size at which Memstore is flushed to disk. 默认大小为 128 MB。The default size is 128 MB.

  • Hbase 区域块乘数由 hbase.hregion.memstore.block.multiplier 定义。The HBase region block multiplier is defined by hbase.hregion.memstore.block.multiplier. 默认值为 4。The default value is 4. 允许的最大值为 8。The maximum allowed is 8.

  • 如果 Memstore 为 (hbase.hregion.memstore.flush.size * hbase.hregion.memstore.block.multiplier) 字节,则 HBase 会阻止更新。HBase blocks updates if the Memstore is (hbase.hregion.memstore.flush.size * hbase.hregion.memstore.block.multiplier) bytes.

    使用刷新大小和块乘数的默认值时,如果 Memstore 大小为 128 * 4 = 512 MB,则会阻止更新。With the default values of flush size and block multiplier, updates are blocked when Memstore is 128 * 4 = 512 MB in size. 若要减少更新阻止计数,请增大 hbase.hregion.memstore.block.multiplier 的值。To reduce the update blocking count, increase the value of hbase.hregion.memstore.block.multiplier.

Apache HBase 区域块乘数

定义 Memstore 大小Define Memstore size

Memstore 大小由 hbase.regionserver.global.memstore.UpperLimithbase.regionserver.global.memstore.LowerLimit 参数定义。Memstore size is defined by the hbase.regionserver.global.memstore.UpperLimit and hbase.regionserver.global.memstore.LowerLimit parameters. 将这些值设置为相等可以减少写入期间的暂停次数(同时提高刷新频率),并可以提高写入性能。Setting these values equal to each other reduces pauses during writes (also causing more frequent flushing) and results in increased write performance.

设置 Memstore 本地分配缓冲区Set Memstore local allocation buffer

Memstore 本地分配缓冲区使用率由 hbase.hregion.memstore.mslab.enabled 属性确定。Memstore local allocation buffer usage is determined by the property hbase.hregion.memstore.mslab.enabled. 如果已启用 (true),则此设置可以防止在执行写入密集型操作期间出现堆碎片。When enabled (true), this setting prevents heap fragmentation during heavy write operation. 默认值为 true。The default value is true.

hbase.hregion.memstore.mslab.enabled

后续步骤Next steps