资源调控Resource governance

在同一节点或群集上运行多个服务时,其中一个服务可能会占用较多资源,导致相应流程中的其他服务缺少资源。When you're running multiple services on the same node or cluster, it is possible that one service might consume more resources, starving other services in the process. 这种问题称为“邻近干扰”问题。This problem is referred to as the "noisy neighbor" problem. 借助 Azure Service Fabric,开发者可以为每个服务指定请求和限制来限制资源使用情况,从而控制此行为。Azure Service Fabric enables the developer to control this behavior by specifying requests and limits per service to limit resource usage.

备注

继续阅读本文之前,建议先熟悉 Service Fabric 应用程序模型Service Fabric 托管模型Before you proceed with this article, we recommend that you get familiar with the Service Fabric application model and the Service Fabric hosting model.

资源调控指标Resource governance metrics

根据服务包,Service Fabric 支持资源治理。Resource governance is supported in Service Fabric in accordance with the service package. 可以在代码包之间进一步划分分配到服务包的资源。The resources that are assigned to the service package can be further divided between code packages. Service Fabric 通过两个内置指标为每个服务包的 CPU 和内存治理提供支持:Service Fabric supports CPU and memory governance per service package, with two built-in metrics:

  • CPU(指标名称 servicefabric:/_CpuCores):主机上可用的逻辑核心。CPU (metric name servicefabric:/_CpuCores): A logical core that is available on the host machine. 所有节点上的全部内核都进行了相同的加权。All cores across all nodes are weighted the same.

  • 内存(指标名称 servicefabric:/_MemoryInMB):内存以 MB 表示,并映射到计算机上可用的物理内存。Memory (metric name servicefabric:/_MemoryInMB): Memory is expressed in megabytes, and it maps to physical memory that is available on the machine.

对于这两个指标,群集资源管理器 (CRM) 会跟踪总群集容量、群集中每个节点上的负载以及群集中剩余的资源。For these two metrics, Cluster Resource Manager (CRM) tracks total cluster capacity, the load on each node in the cluster, and the remaining resources in the cluster. 这两个指标等同于其他任何用户指标或自定义指标。These two metrics are equivalent to any other user or custom metric. 现有全部功能都可以与它们结合使用:All existing features can be used with them:

  • 群集可根据这两个指标进行均衡(默认行为)。The cluster can be balanced according to these two metrics (default behavior).
  • 群集可根据这两个指标进行碎片整理The cluster can be defragmented according to these two metrics.
  • 描述群集时,可为这两个指标设置缓冲容量。When describing a cluster, buffered capacity can be set for these two metrics.

备注

这些指标不支持动态负载报告;在创建时即定义了这些指标的负载。Dynamic load reporting is not supported for these metrics; loads for these metrics are defined at creation time.

资源治理机制Resource governance mechanism

从版本 7.2 开始,Service Fabric 运行时支持为 CPU 和内存资源指定请求和限制。Starting with version 7.2, Service Fabric runtime supports specification of requests and limits for CPU and memory resources.

备注

低于 7.2 的 Service Fabric 运行时版本仅支持单个值同时充当特定资源(CPU 或内存)的 请求限制 的模型。Service Fabric runtime versions older than 7.2 only support a model where a single value serves both as the request and the limit for a particular resource (CPU or memory). 本文档中将此描述为 RequestsOnly 规范。This is described as the RequestsOnly specification in this document.

  • 请求:CPU 和内存请求值表示群集资源管理器 (CRM) 用于 servicefabric:/_CpuCoresservicefabric:/_MemoryInMB 指标的负载。Requests: CPU and memory request values represent the loads used by the Cluster Resource Manager (CRM) for the servicefabric:/_CpuCores and servicefabric:/_MemoryInMB metrics. 换句话说,CRM 会将服务的资源消耗视为与其请求值相等,并在做出放置决策时使用这些值。In other words, CRM considers the resource consumption of a service to be equal to its request values and uses these values when making placement decisions.

  • 限制:CPU 和内存限制值表示在节点上激活进程或容器时应用的实际资源限制。Limits: CPU and Memory limit values represent the actual resource limits applied when a process or a container is activated on a node.

Service Fabric 允许对 CPU 和内存使用 RequestsOnly、LimitsOnlyRequestsAndLimits 规范。Service Fabric allows RequestsOnly, LimitsOnly and both RequestsAndLimits specifications for CPU and memory.

  • 使用 RequestsOnly 规范时,Service Fabric 还会将请求值用作限制。When RequestsOnly specification is used, service fabric also uses the request values as limits.
  • 使用 LimitsOnly 规范时,Service Fabric 会将请求值视为 0。When LimitsOnly specification is used, service fabric considers the request values to be 0.
  • 使用 RequestsAndLimits 规范时,限制值必须大于或等于请求值。When RequestsAndLimits specification is used, the limit values must be greater than or equal to the request values.

为了更好地了解资源治理机制,让我们看一个示例放置方案,其中的 RequestsOnly 规范用于 CPU 资源(用于内存治理的机制是等效的)。To better understand the resource governance mechanism, let's look at an example placement scenario with a RequestsOnly specification for the CPU resource (mechanism for memory governance is equivalent). 假设某个节点有两个 CPU 核心,将在其上放置两个服务包。Consider a node with two CPU cores and two service packages that will be placed on it. 要放置的第一个服务包仅包含一个容器代码包,并且仅指定一个 CPU 核心的请求。The first service package to be placed, is composed of just one container code package and only specifies a request of one CPU core. 要放置的第二个服务包仅包含一个基于进程的代码包,并且也仅指定一个 CPU 核心的请求。The second service package to be placed, is composed of just one process based code package and also only specifies a request of one CPU core. 由于这两个服务包都具有 RequestsOnly 规范,因此其限制值将设置为其请求值。Since both service packages have a RequestsOnly specification, their limit values are set to their request values.

  1. 首先,请求一个 CPU 核心的基于容器的服务包被放置在节点上。First the container based service package requesting one CPU core is placed on the node. 运行时会激活此容器,并将 CPU 限制设置为一个核心。The runtime activates the container and sets the CPU limit to one core. 此容器无法使用多个内核。The container won't be able to use more than one core.

  2. 接下来,请求一个 CPU 核心的基于进程的服务包被放置在节点上。Next, the process based service package requesting one CPU core is placed on the node. 运行时会激活服务进程,并将其 CPU 限制设置为一个核心。The runtime activates the service process and sets its CPU limit to one core.

此时,请求之和等于节点的容量。At this point, the sum of requests is equal to the capacity of the node. CRM 不会在此节点上放置具有 CPU 请求的任何其他容器或服务进程。CRM will not place any more containers or service processes with CPU requests on this node. 在节点上,进程和容器各自使用一个核心运行,互不争用 CPU。On the node, a process and a container are running with one core each and will not contend with each other for CPU.

现在,让我们重新看一下具有 RequestsAndLimits 规范的示例。Let's now revisit our example with a RequestsAndLimits specification. 这次,基于容器的服务包指定的请求为一个 CPU 核心,指定的限制为两个 CPU 核心。This time the container based service package specifies a request of one CPU core and a limit of two CPU cores. 基于进程的服务包同时将请求和限制指定为一个 CPU 核心。The process based service package specifies both a request and a limit of one CPU core.

  1. 首先,基于容器的服务包被放置在节点上。First the container based service package is placed on the node. 运行时会激活此容器,并将 CPU 限制设置为两个核心。The runtime activates the container and sets the CPU limit to two cores. 此容器无法使用两个以上的核心。The container won't be able to use more than two cores.
  2. 接下来,基于进程的服务包被放置在节点上。Next, the process based service package is placed on the node. 运行时会激活服务进程,并将其 CPU 限制设置为一个核心。The runtime activates the service process and sets its CPU limit to one core.

此时,放置在节点上的服务包的 CPU 请求总数等于节点的 CPU 容量。At this point, the sum of CPU requests of service packages that are placed on the node is equal to the CPU capacity of the node. CRM 不会在此节点上放置具有 CPU 请求的任何其他容器或服务进程。CRM will not place any more containers or service processes with CPU requests on this node. 但是,在节点上,限制总和(容器的两个核心 + 进程的一个核心)超出了两个核心这一容量。However, on the node, the sum of limits (two cores for the container + one core for the process) exceeds the capacity of two cores. 如果容器和进程同时突发,则可能会争用 CPU 资源。If the container and the process burst at the same time, there is possibility of contention for the CPU resource. 此类争用将由平台的基础操作系统进行管理。Such contention will be manged by the underlying OS for the platform. 在此示例中,容器可能会突然增加至两个 CPU 核心,导致进程的一个 CPU 核心这一请求得不到保证。For this example, the container could burst up to two CPU cores, resulting in the process's request of one CPU core not being guaranteed.

备注

如前面的示例所示,对 CPU 和内存的请求值不会导致在节点上保留资源。As illustrated in the previous example, the request values for CPU and memory do not lead to reservation of resources on a node. 这些值表示群集资源管理器在做出放置决策时考虑的资源消耗。These values represent the resource consumption that the Cluster Resource Manager considers when making placement decisions. 限制值表示在节点上激活进程或容器时应用的实际资源限制。Limit values represent the actual resource limits applied when a process or a container is activated on a node.

在某些情况下,可能存在 CPU 争用现象。There are a few situations in which there might be contention for CPU. 在此类情况下,示例中的进程和容器可能会遇到邻近干扰问题:In these situations, the process and container from our example might experience the noisy neighbor problem:

  • 混用调控和非调控服务与容器:如果用户创建服务时没有指定任何资源治理,运行时将它视为不占用任何资源,能够将它放置在示例中的节点上。Mixing governed and non-governed services and containers: If a user creates a service without any resource governance specified, the runtime sees it as consuming no resources, and can place it on the node in our example. 在这种情况下,这一新进程实际上会占用部分 CPU,占用的是已在节点上运行的服务的份额。In this case, this new process effectively consumes some CPU at the expense of the services that are already running on the node. 此问题有两种解决方案。There are two solutions to this problem. 在同一群集中不混用治理和非治理服务,或使用放置约束,阻止这两种类型的服务最终位于同一组节点上。Either don't mix governed and non-governed services on the same cluster, or use placement constraints so that these two types of services don't end up on the same set of nodes.

  • 其他进程在 Service Fabric 外的节点上启动(例如 OS 服务) :在这种情况下,Service Fabric 外的进程也会与现有服务争用 CPU。When another process is started on the node, outside Service Fabric (for example, an OS service): In this situation, the process outside Service Fabric also contends for CPU with existing services. 此问题的解决方案是,考虑 OS 开销以正确设置节点容量,如下一部分中所示。The solution to this problem is to set up node capacities correctly to account for OS overhead, as shown in the next section.

  • 当请求不等于限制时:如前面的 RequestsAndLimits 示例所述,请求不会导致在节点上预留资源。When requests are not equal to limits: As described in the RequestsAndLimits example earlier, requests do not lead to reservation of resources on a node. 当某个节点上放置了限制大于请求的服务时,它消耗的资源(如果有)可能会达到其限制。When a service with limits greater than requests is placed on a node, it may consume resources (if available) up to it limits. 在此类情况下,节点上的其他服务消耗的资源可能可能无法达到其请求值。In such cases, other services on the node might not be able to consume resources up to their request values.

启用资源治理所需的群集设置Cluster setup for enabling resource governance

当节点启动并加入群集时,Service Fabric 会先检测可用内存量和可用内核数,再设置这两个资源的节点容量。When a node starts and joins the cluster, Service Fabric detects the available amount of memory and the available number of cores, and then sets the node capacities for those two resources.

为了给操作系统以及可能在节点上运行的其他进程留出缓冲空间,Service Fabric 只使用节点上 80% 的可用资源。To leave buffer space for the operating system, and for other processes that might be running on the node, Service Fabric uses only 80% of the available resources on the node. 此百分比可进行配置,并且可在群集清单中进行更改。This percentage is configurable, and can be changed in the cluster manifest.

以下示例介绍如何指示 Service Fabric 使用 50% 的可用 CPU 和 70% 的可用内存:Here is an example of how to instruct Service Fabric to use 50% of available CPU and 70% of available memory:

<Section Name="PlacementAndLoadBalancing">
    <!-- 0.0 means 0%, and 1.0 means 100%-->
    <Parameter Name="CpuPercentageNodeCapacity" Value="0.5" />
    <Parameter Name="MemoryPercentageNodeCapacity" Value="0.7" />
</Section>

对于大多数客户和方案,建议的配置是自动检测 CPU 和内存的节点容量(默认情况下自动检测已启用)。For most customers and scenarios, automatic detection of node capacities for CPU and memory is the recommended configuration (automatic detection is turned on by default). 但是,如果需要完全手动设置节点容量,则可以使用用于描述群集中节点的机制按节点类型对其进行配置。However, if you need full manual setup of node capacities, you can configure them per node type using the mechanism for describing nodes in the cluster. 下面的示例展示了如何设置具有四个核心和 2GB 内存的节点类型:Here is an example of how to set up the node type with four cores and 2 GB of memory:

    <NodeType Name="MyNodeType">
      <Capacities>
        <Capacity Name="servicefabric:/_CpuCores" Value="4"/>
        <Capacity Name="servicefabric:/_MemoryInMB" Value="2048"/>
      </Capacities>
    </NodeType>

如果已启用自动检测可用资源,并在群集清单中手动定义了节点容量,Service Fabric 会检查节点中的资源是否足以支持用户定义的容量:When auto-detection of available resources is enabled, and node capacities are manually defined in the cluster manifest, Service Fabric checks that the node has enough resources to support the capacity that the user has defined:

  • 如果清单中定义的节点容量小于或等于节点上的可用资源,Service Fabric 使用清单中指定的容量。If node capacities that are defined in the manifest are less than or equal to the available resources on the node, then Service Fabric uses the capacities that are specified in the manifest.

  • 如果清单中定义的节点容量大于可用资源,Service Fabric 使用可用资源作为节点容量。If node capacities that are defined in the manifest are greater than available resources, Service Fabric uses the available resources as node capacities.

如果不需要,可以禁用自动检测可用资源。Auto-detection of available resources can be turned off if it is not required. 若要禁用此功能,请更改以下设置:To turn it off, change the following setting:

<Section Name="PlacementAndLoadBalancing">
    <Parameter Name="AutoDetectAvailableResources" Value="false" />
</Section>

为了获得最佳性能,还应在群集清单中打开以下设置:For optimal performance, the following setting should also be turned on in the cluster manifest:

<Section Name="PlacementAndLoadBalancing">
    <Parameter Name="PreventTransientOvercommit" Value="true" />
    <Parameter Name="AllowConstraintCheckFixesDuringApplicationUpgrade" Value="true" />
</Section>

重要

从 Service Fabric version 7.0 开始,我们更新了在用户手动提供节点资源容量值的情况下,节点资源容量的规则计算方法。Starting with Service Fabric version 7.0, we have updated the rule for how node resource capacities are calculated in the cases where user manually provides the values for node resource capacities. 让我们考虑以下这种情况:Let's consider the following scenario:

  • 节点上总共有 10 个 CPU 核心There are a total of 10 CPU cores on the node
  • SF 配置为使用用户服务总资源的 80%(默认设置),这将为节点上运行的其他服务(包括 Service Fabric 系统服务)保留 20% 的缓冲区SF is configured to use 80% of the total resources for the user services (default setting), which leaves a buffer of 20% for the other services running on the node (incl. Service Fabric system services)
  • 用户决定手动覆盖 CPU 核心数指标的节点资源容量,并将其设置为 5 个核心User decides to manually override the node resource capacity for the CPU cores metric, and sets it to 5 cores

我们已更改了关于 Service Fabric 用户服务可用容量的规则计算方式,内容如下:We have changed the rule on how the available capacity for Service Fabric user services is calculated in the following way:

  • 在 Service Fabric 7.0 之前,用户服务的可用容量将计算为 5 个核心数(忽略 20% 的容量缓冲区)Before Service Fabric 7.0, available capacity for user services would be calculated to 5 cores (capacity buffer of 20% is ignored)
  • 从 Service Fabric 7.0 开始,用户服务的可用容量将计算为 4 个核心数(不忽略 20% 的容量缓冲区)Starting with Service Fabric 7.0, available capacity for user services would be calculated to 4 cores (capacity buffer of 20% is not ignored)

指定资源治理Specify resource governance

资源治理请求和限制是在应用程序清单(ServiceManifestImport 部分)中指定的。Resource governance requests and limits are specified in the application manifest (ServiceManifestImport section). 让我们看看几个示例:Let's look at a few examples:

示例 1:RequestsOnly 规范Example 1: RequestsOnly specification

<?xml version='1.0' encoding='UTF-8'?>
<ApplicationManifest ApplicationTypeName='TestAppTC1' ApplicationTypeVersion='vTC1' xsi:schemaLocation='http://schemas.microsoft.com/2011/01/fabric ServiceFabricServiceModel.xsd' xmlns='http://schemas.microsoft.com/2011/01/fabric' xmlns:xsi='https://www.w3.org/2001/XMLSchema-instance'>
  <ServiceManifestImport>
    <ServiceManifestRef ServiceManifestName='ServicePackageA' ServiceManifestVersion='v1'/>
    <Policies>
      <ServicePackageResourceGovernancePolicy CpuCores="1"/>
      <ResourceGovernancePolicy CodePackageRef="CodeA1" CpuShares="512" MemoryInMB="1024" />
      <ResourceGovernancePolicy CodePackageRef="CodeA2" CpuShares="256" MemoryInMB="1024" />
    </Policies>
  </ServiceManifestImport>

在此示例中,使用 CpuCores 属性指定的请求为 1 个 CPU 核心(适用于 ServicePackageA)。In this example, the CpuCores attribute is used to specify a request of 1 CPU core for ServicePackageA. 由于未指定 CPU 限制(CpuCoresLimit 属性),Service Fabric 还使用指定的请求值(1 个核心)作为服务包的 CPU 限制。Since the CPU limit (CpuCoresLimit attribute) is not specified, Service Fabric also uses the specified request value of 1 core as the CPU limit for the service package.

ServicePackageA 将仅放置在符合以下要求的节点上:减去 放置在该节点上的所有服务包的 CPU 请求之和 后,剩余 CPU 容量大于或等于 1 个核心。ServicePackageA will only be placed on a node where the remaining CPU capacity after subtracting the sum of CPU requests of all service packages placed on that node is greater than or equal to 1 core. 在该节点上,服务包的核心数限制为一个核心。On the node, the service package will be limited to one core. 此服务包包含两个代码包(CodeA1CodeA2),并且都指定了 CpuShares 属性。The service package contains two code packages (CodeA1 and CodeA2), and both specify the CpuShares attribute. CpuShares 512:256 这一比例用来计算各个代码包的 CPU 限制。The proportion of CpuShares 512:256 is used to calculate the CPU limits for the individual code packages. 因此,CodeA1 的限制为一个核心的三分之二,CodeA2 的限制为一个核心的三分之一。Thus, CodeA1 will be limited to two-thirds of a core, and CodeA2 will be limited to one-third of a core. 如果没有为所有代码包指定 CpuShares,则 Service Fabric 会在这两个代码包之间平分 CPU 限制。If CpuShares are not specified for all code packages, Service Fabric divides the CPU limit equally among them.

尽管为代码包指定的 CpuShares 表示服务包总体 CPU 限制的相对比例,但代码包的内存值是以绝对值指定的。While CpuShares specified for code packages represent their relative proportion of the service package's overall CPU limit, memory values for code packages are specified in absolute terms. 在此示例中,MemoryInMB 特性用于为 CodeA1 和 CodeA2 指定 1024 MB 的内存请求。In this example, the MemoryInMB attribute is used to specify memory requests of 1024 MB for both CodeA1 and CodeA2. 由于未指定内存限制(MemoryInMBLimit 特性),因此 Service Fabric 还使用指定的请求值作为代码包的限制。Since the memory limit (MemoryInMBLimit attribute) is not specified, Service Fabric also uses the specified request values as the limits for the code packages. 服务包的内存请求(和限制)计算为其成分代码包的内存请求(和限制)值之和。The memory request (and limit) for the service package is calculated as the sum of memory request (and limit) values of its constituent code packages. 因此,对于 ServicePackageA,内存请求和限制的计算结果为 2048 MB。Thus for ServicePackageA, the memory request and limit is calculated as 2048 MB.

ServicePackageA 将仅放置在符合以下要求的节点上:减去 放置在该节点上的所有服务包的内存请求之和 后,剩余内存容量大于或等于 2048 MB。ServicePackageA will only be placed on a node where the remaining memory capacity after subtracting the sum of memory requests of all service packages placed on that node is greater than or equal to 2048 MB. 在该节点上,两个代码包每个的内存限制为 1024 MB。On the node, both code packages will be limited to 1024 MB of memory each. 代码包(容器或进程)无法分配超出此限制的内存,尝试那样做会导致内存不足异常。Code packages (containers or processes) will not be able to allocate more memory than this limit, and attempting to do so will result in out-of-memory exceptions.

示例 2:LimitsOnly 规范Example 2: LimitsOnly specification

<?xml version='1.0' encoding='UTF-8'?>
<ApplicationManifest ApplicationTypeName='TestAppTC1' ApplicationTypeVersion='vTC1' xsi:schemaLocation='http://schemas.microsoft.com/2011/01/fabric ServiceFabricServiceModel.xsd' xmlns='http://schemas.microsoft.com/2011/01/fabric' xmlns:xsi='https://www.w3.org/2001/XMLSchema-instance'>
  <ServiceManifestImport>
    <ServiceManifestRef ServiceManifestName='ServicePackageA' ServiceManifestVersion='v1'/>
    <Policies>
      <ServicePackageResourceGovernancePolicy CpuCoresLimit="1"/>
      <ResourceGovernancePolicy CodePackageRef="CodeA1" CpuShares="512" MemoryInMBLimit="1024" />
      <ResourceGovernancePolicy CodePackageRef="CodeA2" CpuShares="256" MemoryInMBLimit="1024" />
    </Policies>
  </ServiceManifestImport>

此示例使用 CpuCoresLimitMemoryInMBLimit 特性,这些特性仅在 SF 7.2 及更高版本中可用。This example uses CpuCoresLimit and MemoryInMBLimit attributes, which are only available in SF versions 7.2 and later. CpuCoresLimit 特性用来为 ServicePackageA 指定 1 个核心这一 CPU 限制。The CpuCoresLimit attribute is used to specify a CPU limit of 1 core for ServicePackageA. 由于未指定 CPU 请求(CpuCores 特性),因此它将被视为 0。Since CPU request (CpuCores attribute) is not specified, it is considered to be 0. MemoryInMBLimit 特性用于为 CodeA1 和 CodeA2 指定 1024 MB 的内存限制,由于未指定请求(MemoryInMB 特性),它们会被视为 0。MemoryInMBLimit attribute is used to specify memory limits of 1024 MB for CodeA1 and CodeA2 and since requests (MemoryInMB attribute) are not specified, they are considered to be 0. 因此,ServicePackageA 的内存请求和限制分别计算为 0 和 2048。The memory request and limit for ServicePackageA are thus calculated as 0 and 2048 respectively. 由于 ServicePackageA 的 CPU 和内存请求均为 0,因此对于 servicefabric:/_CpuCoresservicefabric:/_MemoryInMB 指标,它并没有为 CRM 提供需要考虑进行放置的负载。Since both CPU and memory requests for ServicePackageA are 0, it presents no load for CRM to consider for placement, for the servicefabric:/_CpuCores and servicefabric:/_MemoryInMB metrics. 因此,从资源治理的角度来看,不管剩余容量为多少ServicePackageA 都可以放置在任何节点上。Therefore, from a resource governance perspective, ServicePackageA can be placed on any node regardless of remaining capacity. 类似于示例 1,在该节点上,CodeA1 的限制为一个核心的三分之二和 1024 MB 内存,CodeA2 的限制为一个核心的三分之一和 1024 MB 内存。Similar to example 1, on the node, CodeA1 will be limited to two-thirds of a core and 1024 MB of memory, and CodeA2 will be limited to one-third of a core and 1024 MB of memory.

示例 3:RequestsAndLimits 规范Example 3: RequestsAndLimits specification

<?xml version='1.0' encoding='UTF-8'?>
<ApplicationManifest ApplicationTypeName='TestAppTC1' ApplicationTypeVersion='vTC1' xsi:schemaLocation='http://schemas.microsoft.com/2011/01/fabric ServiceFabricServiceModel.xsd' xmlns='http://schemas.microsoft.com/2011/01/fabric' xmlns:xsi='https://www.w3.org/2001/XMLSchema-instance'>
  <ServiceManifestImport>
    <ServiceManifestRef ServiceManifestName='ServicePackageA' ServiceManifestVersion='v1'/>
    <Policies>
      <ServicePackageResourceGovernancePolicy CpuCores="1" CpuCoresLimit="2"/>
      <ResourceGovernancePolicy CodePackageRef="CodeA1" CpuShares="512" MemoryInMB="1024" MemoryInMBLimit="3072" />
      <ResourceGovernancePolicy CodePackageRef="CodeA2" CpuShares="256" MemoryInMB="2048" MemoryInMBLimit="4096" />
    </Policies>
  </ServiceManifestImport>

基于前两个示例,此示例演示了如何指定针对 CPU 和内存的请求和限制。Building upon the first two examples, this example demonstrates specifying both requests and limits for CPU and memory. ServicePackageA 的 CPU 和内存请求分别为 1 个核心和 3072 (1024 + 2048) MB。ServicePackageA has CPU and memory requests of 1 core and 3072 (1024 + 2048) MB respectively. 它只能放置在从节点的总 CPU(和内存)容量中减去放置在节点上的所有服务包的所有 CPU(和内存)请求之和后至少剩余 1 个核心(和 3072 MB)容量的节点上。It can only be placed on a node which has at least 1 core (and 3072 MB) of capacity left after subtracting the sum of all CPU (and memory) requests of all service packages that are placed on the node from the total CPU (and memory) capacity of the node. 在该节点上,CodeA1 的限制为 2 个核心的三分之二和 3072 MB 内存,CodeA2 的限制为 2 个核心的三分之一和 4096 MB 内存。On the node, CodeA1 will be limited to two-thirds of 2 cores and 3072 MB of memory while CodeA2 will be limited to one-third of 2 cores and 4096 MB of memory.

使用应用程序参数Using application parameters

指定资源调控设置时,可使用应用程序参数管理多个应用配置。When specifying resource governance settings, it is possible to use application parameters to manage multiple app configurations. 下例展示应用程序参数的用法:The following example shows the usage of application parameters:

<?xml version='1.0' encoding='UTF-8'?>
<ApplicationManifest ApplicationTypeName='TestAppTC1' ApplicationTypeVersion='vTC1' xsi:schemaLocation='http://schemas.microsoft.com/2011/01/fabric ServiceFabricServiceModel.xsd' xmlns='http://schemas.microsoft.com/2011/01/fabric' xmlns:xsi='https://www.w3.org/2001/XMLSchema-instance'>

  <Parameters>
    <Parameter Name="CpuCores" DefaultValue="4" />
    <Parameter Name="CpuSharesA" DefaultValue="512" />
    <Parameter Name="CpuSharesB" DefaultValue="512" />
    <Parameter Name="MemoryA" DefaultValue="2048" />
    <Parameter Name="MemoryB" DefaultValue="2048" />
  </Parameters>

  <ServiceManifestImport>
    <ServiceManifestRef ServiceManifestName='ServicePackageA' ServiceManifestVersion='v1'/>
    <Policies>
      <ServicePackageResourceGovernancePolicy CpuCores="[CpuCores]"/>
      <ResourceGovernancePolicy CodePackageRef="CodeA1" CpuShares="[CpuSharesA]" MemoryInMB="[MemoryA]" />
      <ResourceGovernancePolicy CodePackageRef="CodeA2" CpuShares="[CpuSharesB]" MemoryInMB="[MemoryB]" />
    </Policies>
  </ServiceManifestImport>

在此示例中,会为生产环境设置默认参数,其中每个服务包具有 4 核和 2 GB 内存。In this example, default parameter values are set for the production environment, where each Service Package would get 4 cores and 2 GB of memory. 可使用应用程序参数文件更改默认值。It is possible to change default values with application parameter files. 在此示例中,一个参数文件可以用来本地测试应用程序,其中它获得的资源将少于生产中所得:In this example, one parameter file can be used for testing the application locally, where it would get less resources than in production:

<!-- ApplicationParameters\Local.xml -->

<Application Name="fabric:/TestApplication1" xmlns="http://schemas.microsoft.com/2011/01/fabric">
  <Parameters>
    <Parameter Name="CpuCores" DefaultValue="2" />
    <Parameter Name="CpuSharesA" DefaultValue="512" />
    <Parameter Name="CpuSharesB" DefaultValue="512" />
    <Parameter Name="MemoryA" DefaultValue="1024" />
    <Parameter Name="MemoryB" DefaultValue="1024" />
  </Parameters>
</Application>

重要

从 Service Fabric 6.1 版开始,可使用应用程序参数指定资源调控。Specifying resource governance with application parameters is available starting with Service Fabric version 6.1.

使用应用程序参数指定资源调控时,Service Fabric 无法降级到 6.1 之前的版本。When application parameters are used to specify resource governance, Service Fabric cannot be downgraded to a version prior to version 6.1.

强制执行用户服务的资源限制Enforcing the resource limits for user services

虽然将资源治理应用于 Service Fabric 服务可以确保这些受治理资源服务不超过其资源配额,但许多用户仍然需要以非治理模式运行某些 Service Fabric 服务。While applying resource governance to your Service Fabric services guarantees that those resource-governed services cannot exceed their resources quota, many users still need to run some of their Service Fabric services in ungoverned mode. 使用非治理 Service Fabric 服务时,可能会遇到“失控”的非治理服务消耗 Service Fabric 节点上所有可用资源的情况,这可能会导致严重的问题,例如:When using ungoverned Service Fabric services, it is possible to run into situations where "runaway" ungoverned services consume all available resources on the Service Fabric nodes, which can lead to serious issues like:

  • 节点上运行的其他服务(包括 Service Fabric 系统服务)的资源不足Resource starvation of other services running on the nodes (including Service Fabric system services)
  • 节点以不正常状态结束Nodes ending up in an unhealthy state
  • 群集管理 API Service Fabric 无响应Unresponsive Service Fabric cluster management APIs

为了防止发生这些情况,Service Fabric 允许你对节点上运行的所有 Service Fabric 用户服务(受治理的和未受治理的)实施资源限制,以确保用户服务永远不会使用超过指定数量的资源。To prevent these situations from occurring, Service Fabric allows you to enforce the resource limits for all Service Fabric user services running on the node (both governed and ungoverned) to guarantee that user services will never use more than the specified amount of resources. 此限制可通过将 ClusterManifest 的 PlacementAndLoadBalancing 部分中的 EnforceUserServiceMetricCapacities 配置的值设置为 true 来实现。This is achieved by setting the value for the EnforceUserServiceMetricCapacities config in the PlacementAndLoadBalancing section of the ClusterManifest to true. 默认情况下,此设置处于关闭状态。This setting is turned off by default.

<SectionName="PlacementAndLoadBalancing">
    <ParameterName="EnforceUserServiceMetricCapacities" Value="false"/>
</Section>

其他备注:Additional remarks:

  • 资源限制强制仅适用于 servicefabric:/_CpuCoresservicefabric:/_MemoryInMB 资源指标Resource limit enforcement only applies to the servicefabric:/_CpuCores and servicefabric:/_MemoryInMB resource metrics
  • 仅当资源指标的节点容量可用于 Service Fabric 时,资源限制强制实施才会起作用,可以通过自动检测机制,也可以通过用户手动指定节点容量(如启用资源治理的群集设置部分所述)。Resource limit enforcement only works if node capacities for the resource metrics are available to Service Fabric, either via auto-detection mechanism, or via users manually specifying the node capacities (as explained in the Cluster setup for enabling resource governance section).  如果未配置节点容量,则无法使用资源限制强制实施功能,因为 Service Fabric 不知道要为用户服务保留多少资源。If node capacities are not configured, the resource limit enforcement capability cannot be used since Service Fabric can't know how much resources to reserve for user services.  如果“EnforceUserServiceMetricCapacities”为 true 但未配置节点容量,则 Service Fabric 将发出运行状况警告。Service Fabric will issue a health warning if "EnforceUserServiceMetricCapacities" is true but node capacities are not configured.

容器的其他资源Other resources for containers

除了 CPU 和内存之外,还可以为容器指定其他资源限制。Besides CPU and memory, it's possible to specify other resource limits for containers. 这些限制是在代码包一级指定,并在容器启动时应用。These limits are specified at the code-package level and are applied when the container is started. 这些资源与 CPU 和内存不同,群集资源管理器不会注意到它们,也不会针对它们进行任何容量检查或负载均衡。Unlike with CPU and memory, Cluster Resource Manager isn't aware of these resources, and won't do any capacity checks or load balancing for them.

  • MemorySwapInMB:容器可使用的交换内存量。MemorySwapInMB: The amount of swap memory that a container can use.
  • MemoryReservationInMB:内存调控软限制,仅当在节点上检测到内存争用时才强制执行此限制。MemoryReservationInMB: The soft limit for memory governance that is enforced only when memory contention is detected on the node.
  • CpuPercent:容器可使用的 CPU 百分比。CpuPercent: The percentage of CPU that the container can use. 如果为服务包指定了 CPU 请求或限制,则实际上会忽略此参数。If CPU requests or limits are specified for the service package, this parameter is effectively ignored.
  • MaximumIOps:容器可使用的最大 IOPS(读取和写入)。MaximumIOps: The maximum IOPS that a container can use (read and write).
  • MaximumIOBytesps:容器可使用(读取和写入)的最大 IO(字节/秒)。MaximumIOBytesps: The maximum IO (bytes per second) that a container can use (read and write).
  • BlockIOWeight:相对于其他容器的块 IO 权重。BlockIOWeight: The block IO weight for relative to other containers.

这些资源可与 CPU 和内存组合。These resources can be combined with CPU and memory. 以下示例显示如何为容器指定其他资源:Here is an example of how to specify additional resources for containers:

<ServiceManifestImport>
    <ServiceManifestRef ServiceManifestName="FrontendServicePackage" ServiceManifestVersion="1.0"/>
    <Policies>
        <ResourceGovernancePolicy CodePackageRef="FrontendService.Code" CpuPercent="5"
        MemorySwapInMB="4084" MemoryReservationInMB="1024" MaximumIOPS="20" />
    </Policies>
</ServiceManifestImport>

后续步骤Next steps