为 Azure HDInsight 群集选择适当的 VM 大小Selecting the right VM size for your Azure HDInsight cluster

本文介绍如何为 HDInsight 群集中的各个节点选择适当的 VM 大小。This article discusses how to select the right VM size for the various nodes in your HDInsight cluster.

首先,请了解虚拟机的属性(例如 CPU 处理、RAM 大小和网络延迟)如何影响工作负荷的处理。Begin by understanding how the properties of a virtual machine such as CPU processing, RAM size, and network latency will affect the processing of your workloads. 接下来,请考虑自己的应用程序,以及它如何与不同的优化 VM 系列相搭配。Next, think about your application and how it matches with what different VM families are optimized for. 请确保你要使用的 VM 系列与你打算部署的群集类型兼容。Make sure that the VM family that you would like to use is compatible with the cluster type that you plan to deploy. 最后,可以使用基准测试过程来测试某些示例工作负荷,并检查该系列中的哪个 SKU 适合自己。Lastly, you can use a benchmarking process to test some sample workloads and check which SKU within that family is right for you.

有关群集其他方面(例如选择存储类型或群集大小)的规划详细信息,请参阅 HDInsight 群集的容量规划For more information on planning other aspects of your cluster such as selecting a storage type or cluster size, see Capacity planning for HDInsight clusters.

VM 属性和大数据工作负荷VM properties and big data workloads

VM 大小和类型由 CPU 处理能力、RAM 大小和网络延迟决定:The VM size and type is determined by CPU processing power, RAM size, and network latency:

  • CPU:VM 大小决定了核心数。CPU: The VM size dictates the number of cores. 核心越多,每个节点可实现的并行计算度就越大。The more cores, the greater the degree of parallel computation each node can achieve. 此外,某些 VM 类型的核心更快。Also, some VM types have faster cores.

  • RAM:VM 大小还决定了 VM 中可用的 RAM 量。RAM: The VM size also dictates the amount of RAM available in the VM. 对于在内存中存储而不是从磁盘读取待处理数据的工作负荷,请确保工作节点能够提供足够的内存来容纳这些数据。For workloads that store data in memory for processing, rather than reading from disk, ensure your worker nodes have enough memory to fit the data.

  • 网络:对于大多数群集类型,群集处理的数据并不在本地磁盘上,而是在 Data Lake Storage 或 Azure 存储之类的外部存储服务中。Network: For most cluster types, the data processed by the cluster isn't on local disk, but rather in an external storage service such as Data Lake Storage or Azure Storage. 考虑节点 VM 与存储服务之间的网络带宽和吞吐量。Consider the network bandwidth and throughput between the node VM and the storage service. 通常,更大 VM 的可用网络带宽越高。The network bandwidth available to a VM typically increases with larger sizes. 有关详细信息,请参阅 VM 大小概述For details, see VM sizes overview.

了解 VM 优化Understanding VM optimization

Azure 中的虚拟机系列经过优化,可以适应不同的用例。Virtual machine families in Azure are optimized to suit different use cases. 在下表中,可以找到一些最常用的用例,以及与它们匹配的 VM 系列。In the table below, you can find some of the most popular use cases and the VM families that match to them.

类型Type 大小Sizes 说明Description
入门级Entry-level A、Av2A, Av2 CPU 性能和内存配置非常适合开发和测试等入门级工作负荷。Have CPU performance and memory configurations best suited for entry level workloads like development and test. 它们的性价比很高,是在 Azure 中入门的低成本选项。They are economical and provide a low-cost option to get started with Azure.
常规用途General purpose D、DSv2、Dv2D, DSv2, Dv2 CPU 与内存之比平衡。Balanced CPU-to-memory ratio. 适用于测试和开发、小到中型数据库和低到中等流量 Web 服务器。Ideal for testing and development, small to medium databases, and low to medium traffic web servers.
计算优化Compute optimized FF 高 CPU 与内存之比。High CPU-to-memory ratio. 适用于中等流量的 Web 服务器、网络设备、批处理和应用程序服务器。Good for medium traffic web servers, network appliances, batch processes, and application servers.
内存优化Memory optimized Esv3、Ev3Esv3, Ev3 高内存与 CPU 之比。High memory-to-CPU ratio. 适用于关系数据库服务器、中到大型规模的缓存和内存中分析。Great for relational database servers, medium to large caches, and in-memory analytics.
  • 有关 HDInsight 支持的不同区域中可用 VM 实例的定价信息,请参阅 HDInsight 定价For information about pricing of available VM instances across HDInsight supported regions, see HDInsight Pricing.

适用于轻量工作负荷的成本节省型 VM 类型Cost saving VM types for light workloads

如果你的处理要求不高,F 系列可能是在 HDInsight 中入门的一个不错选项。If you have light processing requirements, the F-series can be a good choice to get started with HDInsight. 根据每个 vCPU 的 Azure 计算单位 (ACU),F 系列以较低的小时定价,在 Azure 产品组合中具有最高性价比。At a lower per-hour list price, the F-series is the best value in price-performance in the Azure portfolio based on the Azure Compute Unit (ACU) per vCPU.

下表描述了可以使用 Fsv2 系列 VM 创建的群集类型和节点类型。The following table describes the cluster types and node types, which can be created with the Fsv2-series VMs.

群集类型Cluster Type 版本Version 工作器节点Worker Node 头节点Head Node ZooKeeper 节点Zookeeper Node
SparkSpark 全部All F4 和更大F4 and above no no
HadoopHadoop 全部All F4 和更大F4 and above no no
KafkaKafka 全部All F4 和更大F4 and above no no
HBaseHBase 全部All F4 和更大F4 and above no no
LLAPLLAP disableddisabled no no no
StormStorm disableddisabled no no no

基准测试Benchmarking

基准测试是在不同的 VM 上运行模拟工作负荷,以测量它们在生产工作负荷中的表现的过程。Benchmarking is the process of running simulated workloads on different VMs to measure how well they will perform for your production workloads.

有关 VM SKU 和群集大小基准测试的详细信息,请参阅 Azure HDInsight 中的群集容量规划For more information on benchmarking for VM SKUs and cluster sizes, see Cluster capacity planning in Azure HDInsight .

后续步骤Next steps