Service Fabric 应用程序的容量规划Capacity planning for Service Fabric applications

本文档介绍如何评估运行 Azure Service Fabric 应用程序所需的资源量(CPU、RAM 和磁盘存储空间)。This document teaches you how to estimate the amount of resources (CPUs, RAM, disk storage) you need to run your Azure Service Fabric applications. 资源要求经常会随着时间而变化。It is common for your resource requirements to change over time. 开发/测试服务时需要的资源通常很少,之后进入生产环境且应用程序受欢迎度提高时需要的资源会更多。You typically require few resources as you develop/test your service, and then require more resources as you go into production and your application grows in popularity. 设计应用程序时,应仔细规划长期要求并做出选择,以便到时服务可以缩放以应对较高的客户需求。When you design your application, think through the long-term requirements and make choices that allow your service to scale to meet high customer demand.

创建 Service Fabric 群集时,请确定构成群集的虚拟机 (VM) 类型。When you create a Service Fabric cluster, you decide what kinds of virtual machines (VMs) make up the cluster. 每个 VM 附带有限数量的资源,包括 CPU(核心数和速度)、网络带宽、RAM 和磁盘存储空间。Each VM comes with a limited amount of resources in the form of CPUs (cores and speed), network bandwidth, RAM, and disk storage. 服务随着时间而发展时,可以升级到可提供更多资源的 VM 和/或将更多 VM 添加到群集。As your service grows over time, you can upgrade to VMs that offer greater resources and/or add more VMs to your cluster. 若要采用后一种措施,最初必须架构好服务,使它可以利用动态添加到群集的新 VM。To do the latter, you must architect your service initially so it can take advantage of new VMs that get dynamically added to the cluster.

某些服务本身只能管理 VM 上的少量数据或无法管理任何数据。Some services manage little to no data on the VMs themselves. 因此,这些服务的容量规划应注重于性能,这意味着要选择适当的 VM CPU(核心和速度)。Therefore, capacity planning for these services should focus primarily on performance, which means selecting the appropriate CPUs (cores and speed) of the VMs. 此外,应该考虑网络带宽,包括发生网络传输的频率以及要发送的数据量。In addition, you should consider network bandwidth, including how frequently network transfers are occurring and how much data is being transferred. 随着服务使用量的增加,如果服务需要运行良好,则可以将更多 VM 添加到群集,并跨所有 VM 来负载均衡网络请求。If your service needs to perform well as service usage increases, you can add more VMs to the cluster and load balance the network requests across all the VMs.

针对管理 VM 上大量数据的服务,容量规划应该注重于大小。For services that manage large amounts of data on the VMs, capacity planning should focus primarily on size. 因此,应仔细考虑 VM 的 RAM 和磁盘存储容量。Thus, you should carefully consider the capacity of the VM's RAM and disk storage. Windows 中的虚拟内存管理系统可让应用程序代码将磁盘空间视为 RAM。The virtual memory management system in Windows makes disk space look like RAM to application code. 此外,Service Fabric 运行时提供智能分页,只将热数据保留在内存中,将冷数据移到磁盘。In addition, the Service Fabric runtime provides smart paging keeping only hot data in memory and moving the cold data to disk. 因此,应用程序可以使用比 VM 上实际可用数量更多的内存。Applications can thus use more memory than is physically available on the VM. 增加 RAM 会提高性能,因为 VM 可以在 RAM 中保留更多磁盘存储空间。Having more RAM simply increases performance, since the VM can keep more disk storage in RAM. 选择的 VM 要有够大的磁盘空间来存储想要放在 VM 上的数据。The VM you select should have a disk large enough to store the data that you want on the VM. 同样,VM 也要有足够的 RAM,可提供所需的性能。Similarly, the VM should have enough RAM to provide you with the performance you desire. 如果服务随着时间而发展,可以将更多 VM 添加到群集,并跨所有 VM 分区数据。If your service's data grows over time, you can add more VMs to the cluster and partition the data across all the VMs.

确定需要多少个节点Determine how many nodes you need

将服务分区可以扩展服务的数据。Partitioning your service allows you to scale out your service's data. 有关分区的详细信息,请参阅对 Service Fabric 进行分区For more information on partitioning, see Partitioning Service Fabric. 必须将每个分区放入单个 VM,但也可以将多个(小型)分区放入单个 VM。Each partition must fit within a single VM, but multiple (small) partitions can be placed on a single VM. 因此,相比少量的大型磁盘分区,大量小型分区可提供更大的弹性。So, having more small partitions gives you greater flexibility than having a few larger partitions. 缺点是增加分区会增大 Service Fabric 的负担,并且无法跨分区执行事务操作。The trade-off is that having lots of partitions increases Service Fabric overhead and you cannot perform transacted operations across partitions. 如果服务代码经常需要访问位于不同分区的数据片段,则还可能会产生更多的网络流量。There is also more potential network traffic if your service code frequently needs to access pieces of data that live in different partitions. 设计服务时,应该仔细考虑这些优缺点,以实现有效的分区策略。When designing your service, you should carefully consider these pros and cons to arrive at an effective partitioning strategy.

假设应用程序包含单个有状态服务,该服务的存储大小在一年内预期会增加到 DB_Size GB。Let's assume your application has a single stateful service that has a store size that you expect to grow to DB_Size GB in a year. 在这种情况下,会想要添加更多的应用程序(与分区),以应对该年度后的存储增长。You are willing to add more applications (and partitions) as you experience growth beyond that year. 复制因数 (RF),确定服务的副本数对 DB_Size 总计的影响。The replication factor (RF), which determines the number of replicas for your service impacts the total DB_Size. 复制因数乘以 DB_Size 即为所有副本的 DB_Size 总计。The total DB_Size across all replicas is the Replication Factor multiplied by DB_Size. Node_Size 表示想要用于服务的每节点磁盘空间/RAM。Node_Size represents the disk space/RAM per node you want to use for your service. 为了获得最佳性能,应该使 DB_Size 适应各群集的内存量,并使 Node_Size 大约等于所选 VM 的 RAM 容量。For best performance, the DB_Size should fit into memory across the cluster, and a Node_Size that is around the RAM of the VM should be chosen. 通过分配大于 RAM 容量的 Node_Size,可以依赖于 Service Fabric 运行时提供的分页。By allocating a Node_Size that is larger than the RAM capacity, you are relying on the paging provided by the Service Fabric runtime. 因此,如果整个数据都被视为热数据(因为从那时起数据进入分页/移出分页),性能就可能无法达到最佳。Thus, your performance may not be optimal if your entire data is considered to be hot (since then the data is paged in/out). 但是,对于只有一部分数据是热数据的许多服务而言,还是具有很高的成本效益。However, for many services where only a fraction of the data is hot, it is more cost-effective.

为了获得最大性能所需的节点数目可根据以下公式计算:The number of nodes required for maximum performance can be computed as follows:

Number of Nodes = (DB_Size * RF)/Node_Size

考虑增长Account for growth

除了最初使用的 DB_Size,可能需要根据预期服务增长的 DB_Size 来计算节点。You may want to compute the number of nodes based on the DB_Size that you expect your service to grow to, in addition to the DB_Size that you began with. 然后,随服务发展增加节点数量,这样就不会过度预配节点数量。Then, grow the number of nodes as your service grows so that you are not over-provisioning the number of nodes. 但是,分区数目应该基于以最大增长率运行服务时所需的节点数。But the number of partitions should be based on the number of nodes that are needed when you're running your service at maximum growth.

最好随时准备几台额外的计算机,以便可以处理任何意外的高峰或故障(例如,一些 VM 停机)。It is good to have some extra machines available at any time so that you can handle any unexpected spikes or failure (for example, if a few VMs go down). 尽管额外的容量要使用预期高峰来确定,但一开始可以多预留几个 VM(额外准备 5-10%)。While the extra capacity should be determined by using your expected spikes, a starting point is to reserve a few extra VMs (5-10 percent extra).

上面假设只有一个有状态服务。The preceding assumes a single stateful service. 如果有多个有状态服务,则必须将与其他服务关联的 DB_Size 添加到公式中。If you have more than one stateful service, you have to add the DB_Size associated with the other services into the equation. 或者,可以单独为每个有状态服务计算节点数。Alternatively, you can compute the number of nodes separately for each stateful service. 服务可能包含不平衡的副本或分区。Your service may have replicas or partitions that aren't balanced. 请记住,有些分区的数据可能比其他分区要多。Keep in mind that partitions may also have more data than others. 有关分区的详细信息,请参阅分区最佳实践文章For more information on partitioning, see partitioning article on best practices. 但是,上述公式不受分区或副本影响,因为 Service Fabric 可确保副本以优化方式分散在节点之间。However, the preceding equation is partition and replica agnostic, because Service Fabric ensures that the replicas are spread out among the nodes in an optimized manner.

使用电子表格进行成本计算Use a spreadsheet for cost calculation

现在,让我们在公式中放入一些实际数字。Now let's put some real numbers in the formula. 示例电子表格显示如何规划包含三种数据对象类型的应用程序的容量。An example spreadsheet shows how to plan the capacity for an application that contains three types of data objects. 针对每个对象,我们将估算其大小以及预期需要的对象数。For each object, we approximate its size and how many objects we expect to have. 我们还将选择对每个对象类型需要的副本数。We also select how many replicas we want of each object type. 电子表格将计算要在群集中存储的内存量总计。The spreadsheet calculates the total amount of memory to be stored in the cluster.

然后,输入 VM 大小和每月成本。Then we enter a VM size and monthly cost. 根据 VM 大小,电子表格将告诉你必须至少要提供多少个分区才能拆分数据,使其能够实际包含在节点中。Based on the VM size, the spreadsheet tells you the minimum number of partitions you must use to split your data to physically fit on the nodes. 可能需要大量的分区才能应对应用程序的特定计算和网络流量需求。You may desire a larger number of partitions to accommodate your application's specific computation and network traffic needs. 电子表格显示目前管理用户配置文件对象的分区数已从 1 个增加到 6 个。The spreadsheet shows the number of partitions that are managing the user profile objects has increased from one to six.

现在,根据所有这些信息,电子表格会显示你实际可以获取包含 26 个节点的群集上所需分区和副本的所有数据。Now, based on all this information, the spreadsheet shows that you could physically get all the data with the desired partitions and replicas on a 26-node cluster. 但是,此群集将密集压缩,因此,可能想要添加一些节点来应对节点故障和升级。However, this cluster would be densely packed, so you may want some additional nodes to accommodate node failures and upgrades. 电子表格还显示,节点数超过 57 个不会带来任何附加价值,因为这会出现空节点。The spreadsheet also shows that having more than 57 nodes provides no additional value because you would have empty nodes. 不过,可能仍然想要配置超过 57 个节点,以应对节点故障和升级。Again, you may want to go above 57 nodes anyway to accommodate node failures and upgrades. 可以根据应用程序的特定需求调整电子表格。You can tweak the spreadsheet to match your application's specific needs.


后续步骤Next steps

查看为 Service Fabric 服务分区,了解有关为服务分区的详细信息。Check out Partitioning Service Fabric services to learn more about partitioning your service.