负载均衡器运行状况探测Load Balancer health probes

在 Azure 负载均衡器中使用负载均衡规则时,需要指定运行状况探测,使负载均衡器能够检测后端终结点状态。When using load-balancing rules with Azure Load Balancer, you need to specify health probes to allow Load Balancer to detect the backend endpoint status. 运行状况探测的配置和探测响应确定哪个后端池实例要接收新流。The configuration of the health probe and probe responses determine which backend pool instances will receive new flows. 可以使用运行状况探测来检测后端终结点上应用程序的故障。You can use health probes to detect the failure of an application on a backend endpoint. 还可以对运行状况探测生成自定义响应,并使用用于控制流的运行状况探测来管理负载或计划内停机。You can also generate a custom response to a health probe and use the health probe for flow control to manage load or planned downtime. 运行状况探测失败时,负载均衡器停止向各个不正常的实例发送新流。When a health probe fails, Load Balancer will stop sending new flows to the respective unhealthy instance. 出站连接不受影响,仅入站连接受到影响。Outbound connectivity is not impacted, only inbound connectivity is impacted.

运行状况探测支持多个协议。Health probes support multiple protocols. 可用的具体运行状况探测协议因负载均衡器 SKU 而异。The availability of a specific health probe protocol varies by Load Balancer SKU. 此外,服务的行为也因负载均衡器 SKU 而异,如下表所述:Additionally, the behavior of the service varies by Load Balancer SKU as shown in this table:

标准 SKUStandard SKU 基本 SKUBasic SKU
探测类型Probe types TCP、HTTP、HTTPSTCP, HTTP, HTTPS TCP、HTTPTCP, HTTP
探测停止行为Probe down behavior 所有探测停止,所有 TCP 流继续。All probes down, all TCP flows continue. 所有探测停止,所有 TCP 流过期。All probes down, all TCP flows expire.

重要

请通篇查看本文档,包括下面有关创建可靠服务的重要设计指导原则Review this document in its entirety, including important design guidance below to create a reliable service.

重要

负载均衡器运行状况探测源自 IP 地址 168.63.129.16,要使探测将实例标记为运行,不得阻止这些探测。Load Balancer health probes originate from the IP address 168.63.129.16 and must not be blocked for probes to mark up your instance. 有关详细信息,请查看探测源 IP 地址Review probe source IP address for details.

探测配置Probe configuration

运行状况探测配置由以下元素组成:Health probe configuration consists out of the following elements:

  • 各个探测的间隔持续时间Duration of the interval between individual probes
  • 在探测过渡到其他状态之前必须观测到的探测响应数Number of probe responses which have to be observed before the probe transitions to a different state
  • 探测的协议Protocol of the probe
  • 探测的端口Port of the probe
  • 使用 HTTP(S) 探测时用于 HTTP GET 的 HTTP 路径HTTP path to use for HTTP GET when using HTTP(S) probes

备注

使用 Azure PowerShell、Azure CLI、模板或 API 时,不强制使用或检查探测定义。A probe definition is not mandatory or checked for when using Azure PowerShell, Azure CLI, Templates or API. 仅在使用 Azure 门户时才进行探测验证测试。Probe validation tests are only done when using the Azure Portal.

了解应用程序信号、信号检测和平台反应Understanding application signal, detection of the signal, and reaction of the platform

探测响应数适用于以下两项:The number of probe responses applies to both

  • 允许将实例标记为运行所要成功的探测次数the number of successful probes that allow an instance to be marked as up, and
  • 导致将实例标记为关闭的失败探测数目。the number of failed probes that cause an instance to be marked as down.

指定的超时和间隔值确定实例是标记为运行还是停止。The timeout and interval values specified determine whether an instance will be marked as up or down. 将间隔持续时间与探测响应数相乘可以得出必须检测探测响应的持续时间。The duration of the interval multiplied by the number of probe responses determines the duration during which the probe responses have to be detected. 实现所需的探测后,服务将做出反应。And the service will react after the required probes have been achieved.

可以通过一个示例来进一步演示该行为。We can illustrate the behavior further with an example. 如果将探测响应数设置为 2,将间隔设置为 5 秒,则意味着,必须在 10 秒间隔内观测到 2 次探测失败。If you have set the number of probe responses to 2 and the interval to 5 seconds, this means 2 probe failures must be observed within a 10 second interval. 由于在应用程序更改状态时发送探测的时间未同步,因此我们可以分两种情况来界定检测时间:Because the time at which a probe is sent is not synchronized when your application may change state, we can bound the time to detect by two scenarios:

  1. 如果应用程序在第一个探测抵达之前的那一刻开始生成失败的探测响应,则这些事件的检测需要 10 秒(2 x 5 秒间隔),加上应用程序发出失败信号到第一个探测抵达时的持续时间。If your application starts producing a failing probe response just before the first probe arrives, the detection of these events will take 10 seconds (2 x 5 second intervals) plus the duration of the the application starting to signal a failure to when the the first probe arrives. 可以假设此项检测花费的时间略微超过 10 秒。You can assume this detection to take slightly over 10 seconds.
  2. 如果应用程序在第一个探测抵达之后的那一刻开始生成失败的探测响应,则这些事件的检测只会在下一个探测(和失败)抵达时加上额外 10 秒(2 x 5 秒间隔)持续时间之后才开始。If your application starts producing a failing probe response just after the first probe arrives, the detection of these events will not begin until the next probe arrives (and fails) plus another 10 seconds (2 x 5 second intervals). 可以假设此项检测花费的时间略少于 15 秒。You can assume this detection to take just under 15 seconds.

对于此示例,一旦发生检测,平台就会花费少量的时间对此更改做出反应。For this example, once detection has occurred, the platform will then take a small amount of time to react to this change. 这意味着,根据以下条件This means a depending on

  1. 当应用程序开始更改状态时,并且when the application begins changing state and
  2. 此项更改已检测到并满足所需条件(按指定的间隔发送的探测次数)时,并且when this change is detected and met the required criteria (number of probes sent at the specified interval) and
  3. 在整个平台中传达检测时when the detection has been communicated across the platform

可以假设对失败探测做出反应的时间介于对应用程序信号中的更改做出反应的最短时间(略微超过 10 秒)与最长时间(略微超过 15 秒)之间。you can assume the reaction to a failing probe will take between a minimum of just over 10 seconds and a maximum of slightly over 15 seconds to react to a change in the signal from the application. 此示例旨在演示发生的情况,但是,无法准确预测超出此示例所示大致指导范围的持续时间。This example is provided to illustrate what is taking place, however, it is not possible to forecast an exact duration beyond the above rough guidance illustrated in this example.

探测类型Probe types

可将运行状况探测使用的协议配置为下列其中一项:The protocol used by the health probe can be configured to one of the following:

可用的协议取决于所用的负载均衡器 SKU:The available protocols depend on the Load Balancer SKU used:

TCPTCP HTTPHTTP HTTPSHTTPS
标准 SKUStandard SKU
基本 SKUBasic SKU

TCP 探测TCP probe

TCP 探测通过使用定义的端口执行三方开放式 TCP 握手来初始化连接。TCP probes initiate a connection by performing a three-way open TCP handshake with the defined port. TCP 探测使用四向闭合式 TCP 握手来终止连接。TCP probes terminate a connection with a four-way close TCP handshake.

最小探测间隔为 5 秒,不正常响应的最小数目为 2。The minimum probe interval is 5 seconds and the minimum number of unhealthy responses is 2. 所有间隔的总持续时间不能超过 120 秒。The total duration of all intervals cannot exceed 120 seconds.

如果出现以下情况,TCP 探测将会失败:A TCP probe fails when:

  • 实例上的 TCP 侦听器在超时期限内根本未做出响应。The TCP listener on the instance doesn't respond at all during the timeout period. 根据失败的探测请求(配置为标记探测停止之前未获应答的请求)数目,已将探测标记为停止。A probe is marked down based on the number of failed probe requests, which were configured to go unanswered before marking down the probe.
  • 探测从实例接收 TCP 重置。The probe receives a TCP reset from the instance.

下面演示了如何在资源管理器模板中表达此类探测配置:The following illustrates how you could express this kind of probe configuration in a Resource Manager template:

    {
      "name": "tcp",
      "properties": {
        "protocol": "Tcp",
        "port": 1234,
        "intervalInSeconds": 5,
        "numberOfProbes": 2
      },

HTTP/HTTPS 探测 HTTP / HTTPS probe

备注

HTTPS 探测仅适用于标准负载均衡器HTTPS probe is only available for Standard Load Balancer.

HTTP 和 HTTPS 探测构建在 TCP 探测的基础之上,发出包含指定路径的 HTTP GET。HTTP and HTTPS probes build on the TCP probe and issue an HTTP GET with the specified path. 这两个探测都支持 HTTP GET 的相对路径。Both of these probes support relative paths for the HTTP GET. 与 HTTP 探测一样,HTTPS 探测中也添加了传输层安全性(TLS,前称为 SSL)包装器。HTTPS probes are the same as HTTP probes with the addition of a Transport Layer Security (TLS, formerly known as SSL) wrapper. 如果实例在超时期限内做出响应并返回 HTTP 状态 200,则将运行状况探测标记为运行。The health probe is marked up when the instance responds with an HTTP status 200 within the timeout period. 默认情况下,运行状况探测每隔 15 秒尝试检查配置的运行状况探测端口。The health probe attempts to check the configured health probe port every 15 seconds by default. 最小探测间隔为 5 秒。The minimum probe interval is 5 seconds. 所有间隔的总持续时间不能超过 120 秒。The total duration of all intervals cannot exceed 120 seconds.

如果探测端口也是服务本身的侦听器,则也可以使用 HTTP/HTTPS 探测来实现你自己的逻辑,以便从负载均衡器轮转中删除实例。HTTP / HTTPS probes can also be useful to implement your own logic to remove instances from load balancer rotation if the probe port is also the listener for the service itself. 例如,如果实例的 CPU 利用率超过 90% 并返回非 200 HTTP 状态,则你可以决定删除该实例。For example, you might decide to remove an instance if it's above 90% CPU and return a non-200 HTTP status.

备注

HTTPS 探测要求使用基于证书的证书,该证书在整个链中的最小签名哈希为 SHA256。The HTTPS Probe requires the use of certificates based that have a minimum signature hash of SHA256 in the entire chain.

如果使用云服务,并且现有的 Web 角色使用 w3wp.exe,则你还可以实现自动网站监视。If you use Cloud Services and have web roles that use w3wp.exe, you also achieve automatic monitoring of your website. 网站代码中的错误会将非 200 状态返回给负载均衡器探测。Failures in your website code return a non-200 status to the load balancer probe.

如果出现以下情况,HTTP/HTTPS 探测将会失败:An HTTP / HTTPS probe fails when:

  • 探测终结点返回非 200 的 HTTP 响应代码(例如,403、404 或 500)。Probe endpoint returns an HTTP response code other than 200 (for example, 403, 404, or 500). 这会立即标记运行状况探测停止。This will mark down the health probe immediately.
  • 探测终结点在最小探测间隔和 30 秒超时时间内根本不响应。Probe endpoint doesn't respond at all during the minimum of the probe interval and 30-second timeout period. 在探测标记为未运行并且达到所有超时间隔的总和之前,多个探测请求可能无应答。Multiple probe requests might go unanswered before the probe gets marked as not running and until the sum of all timeout intervals has been reached.
  • 探测终结点通过 TCP 重置关闭连接。Probe endpoint closes the connection via a TCP reset.

下面演示了如何在资源管理器模板中表达此类探测配置:The following illustrates how you could express this kind of probe configuration in a Resource Manager template:

    {
      "name": "http",
      "properties": {
        "protocol": "Http",
        "port": 80,
        "requestPath": "/",
        "intervalInSeconds": 5,
        "numberOfProbes": 2
      },
    {
      "name": "https",
      "properties": {
        "protocol": "Https",
        "port": 443,
        "requestPath": "/",
        "intervalInSeconds": 5,
        "numberOfProbes": 2
      },

来宾代理探测(仅限经典模式)Guest agent probe (Classic only)

云服务角色(辅助角色和 Web 角色)默认使用来宾代理进行探测监视。Cloud service roles (worker roles and web roles) use a guest agent for probe monitoring by default. 来宾代理探测是最后一道配置。A guest agent probe is a last resort configuration. 请始终对 TCP 或 HTTP 探测显式使用运行状况探测。Always use a health probe explicitly with a TCP or HTTP probe. 对于大多数应用程序方案而言,来宾代理探测的有效性不如显式定义的探测。A guest agent probe is not as effective as explicitly defined probes for most application scenarios.

来宾代理探测是对 VM 中来宾代理执行的检查。A guest agent probe is a check of the guest agent inside the VM. 仅当实例处于“就绪”状态时,负载均衡器才侦听并响应“HTTP 200 正常”响应。It then listens and responds with an HTTP 200 OK response only when the instance is in the Ready state. (其他状态包括“繁忙”、“正在回收”或“正在停止”。)(Other states are Busy, Recycling, or Stopping.)

有关详细信息,请参阅配置运行状况探测的服务定义文件 (csdef)开始为云服务创建公共负载均衡器For more information, see Configure the service definition file (csdef) for health probes or Get started by creating a public load balancer for cloud services.

如果来宾代理无法使用“HTTP 200 正常”响应,则负载均衡器会将实例标记为无响应。If the guest agent fails to respond with HTTP 200 OK, the load balancer marks the instance as unresponsive. 然后停止向该实例发送流。It then stops sending flows to that instance. 负载均衡器继续检查实例。The load balancer continues to check the instance.

如果来宾代理使用 HTTP 200 做出响应,则负载均衡器会再次向该实例发送新流。If the guest agent responds with an HTTP 200, the load balancer sends new flows to that instance again.

使用 Web 角色时,网站代码通常在不受 Azure 结构或来宾代理监视的 w3wp.exe 中运行。When you use a web role, the website code typically runs in w3wp.exe, which isn't monitored by the Azure fabric or guest agent. 这系统不会向来宾代理报告 w3wp.exe 中的失败(例如,HTTP 500 响应)。Failures in w3wp.exe (for example, HTTP 500 responses) aren't reported to the guest agent. 因此,负载均衡器不会将该实例退出轮转。Consequently, the load balancer doesn't take that instance out of rotation.

探测行为Probe up behavior

在以下情况下,会将 TCP、HTTP 和 HTTPS 运行状况探测视为正常,并将后端终结点标记为正常:TCP, HTTP, and HTTPS health probes are considered healthy and mark the backend endpoint as healthy when:

  • VM 启动后,运行状况探测成功。The health probe is successful once after the VM boots.
  • 已达到将后端终结点标记为正常所需的指定探测次数。The specified number of probes required to mark the backend endpoint as healthy has been achieved.

已达到正常状态的任何后端终结点都符合接收新流的条件。Any backend endpoint which has achieved a healthy state is eligible for receiving new flows.

备注

如果运行状况探测出现波动,负载均衡器会等待更长时间,然后将后端终结点重新置于正常状态。If the health probe fluctuates, the load balancer waits longer before it puts the backend endpoint back in the healthy state. 这段额外的等待时间可保护用户和基础结构,是在策略中有意指定的。This extra wait time protects the user and the infrastructure and is an intentional policy.

探测停止行为Probe down behavior

TCP 连接TCP connections

与剩余正常后端终结点建立新的 TCP 连接将会成功。New TCP connections will succeed to remaining healthy backend endpoint.

如果后端终结点的运行状况探测失败,与此后端终结点建立的 TCP 连接会继续。If a backend endpoint's health probe fails, established TCP connections to this backend endpoint continue.

如果后端池中所有实例的所有探测都失败,则不会将任何新流发送到后端池。If all probes for all instances in a backend pool fail, no new flows will be sent to the backend pool. 标准负载均衡器将允许已建立的 TCP 流继续。Standard Load Balancer will permit established TCP flows to continue. 基本负载均衡器会终止发往后端池的所有现有 TCP 流。Basic Load Balancer will terminate all existing TCP flows to the backend pool.

负载均衡器是一个直通服务(不终止 TCP 连接),流始终在客户端与 VM 的来宾 OS 和应用程序之间传送。Load Balancer is a pass through service (does not terminate TCP connections) and the flow is always between the client and the VM's guest OS and application. 所有探测均停止的池会导致前端不会对 TCP 连接打开尝试 (SYN) 做出响应,因为没有任何正常的后端终结点可以接收流和响应 SYN-ACK。A pool with all probes down will cause a frontend to not respond to TCP connection open attempts (SYN) as there is no healthy backend endpoint to receive the flow and respond with an SYN-ACK.

UDP 数据报UDP datagrams

UDP 数据报将传送到正常的后端终结点。UDP datagrams will be delivered to healthy backend endpoints.

UDP 是无连接的,并且系统不会跟踪 UDP 的流状态。UDP is connectionless and there is no flow state tracked for UDP. 如果任何后端终结点的运行状况探测失败,则现有的 UDP 流将移到后端池中的另一个运行正常的实例。If any backend endpoint's health probe fails, existing UDP flows will move to another healthy instance in the backend pool.

如果后端池中所有实例的所有探测都失败,则基本和标准负载均衡器的现有 UDP 流将会终止。If all probes for all instances in a backend pool fail, existing UDP flows will terminate for Basic and Standard Load Balancers.

探测源 IP 地址Probe source IP address

负载均衡器对其内部运行状况模型使用分布式探测服务。Load Balancer uses a distributed probing service for its internal health model. 探测服务驻留在每个 VM 主机上,可按需编程,以根据客户的配置生成运行状况探测。The probing service resides on each host where VMs and can be programmed on-demand to generate health probes per the customer's configuration. 运行状况探测流量直接在生成运行状况探测的探测服务与客户 VM 之间传送。The health probe traffic is directly between the probing service that generates the health probe and the customer VM. 所有负载均衡器运行状况探测源自 IP 地址 168.63.129.16(源)。All Load Balancer health probes originate from the IP address 168.63.129.16 as their source. 可以使用 VNet 中不属于 RFC1918 空间的 IP 地址空间。You can use IP address space inside of a VNet that is not RFC1918 space. 使用全局保留的、由 Azure 拥有的 IP 地址可以减少某个 IP 地址与 VNet 中使用的 IP 地址空间发生冲突的可能性。Using a globally reserved, Azure owned IP address reduces the chance of an IP address conflict with the IP address space you use inside the VNet. 此 IP 地址在所有区域中相同且不会改变,不会造成安全风险,因为只有内部 Azure 平台组件可以从此 IP 地址探寻数据包。This IP address is the same in all regions and does not change and is not a security risk because only the internal Azure platform component can source a packet from this IP address.

AzureLoadBalancer 服务标记在网络安全组中标识此源 IP 地址,默认允许运行状况探测流量。The AzureLoadBalancer service tag identifies this source IP address in your network security groups and permits health probe traffic by default.

除了负载均衡器运行状况探测外,以下操作也使用此 IP 地址In addition to Load Balancer health probes, the following operations use this IP address:

  • 使 VM 代理能够与平台通信,以表明它处于“就绪”状态Enables the VM Agent to communicating with the platform to signal it is in a “Ready” state
  • 启用与 DNS 虚拟服务器的通信,以便为未定义自定义 DNS 服务器的客户提供筛选的名称解析。Enables communication with the DNS virtual server to provide filtered name resolution to customers that do not define custom DNS servers. 此筛选可确保客户只能解析其部署的主机名。This filtering ensures that customers can only resolve the hostnames of their deployment.
  • 使 VM 能够从 Azure 中的 DHCP 服务获取动态 IP 地址。Enables the VM to obtain a dynamic IP address from the DHCP service in Azure.

设计指导Design guidance

运行状况探测使服务具有复原能力,并使其可缩放。Health probes are used to make your service resilient and allow it to scale. 错误的配置或不当的设计模式可能会影响服务的可用性与可伸缩性。A misconfiguration or bad design pattern can impact the availability and scalability of your service. 请通篇阅读本文档,并考虑在将此探测响应标记为停机或运行时对方案造成的影响,以及对应用程序方案可用性造成的影响。Review this entire document and consider what the impact to your scenario is when this probe response is marked down or marked up, and how it impacts the availability of your application scenario.

为应用程序设计运行状况模型时,应该探测后端终结点上可以反映该实例以及所提供应用程序服务的运行状况的端口。 When you design the health model for your application, you should probe a port on a backend endpoint that reflects the health of that instance and the application service you are providing. 应用程序端口和探测端口不必相同。The application port and the probe port are not required to be the same. 在某些方案中,探测端口可能需要不同于应用程序提供服务所用的端口。In some scenarios, it may be desirable for the probe port to be different than the port your application provides service on.

有时,有效的方案是让应用程序生成运行状况探测响应,以便不仅检测应用程序的运行状况,而且还直接让负载均衡器知道实例是否要接收新流。Sometimes it can be useful for your application to generate a health probe response to not only detect your application health, but also signal directly to Load Balancer whether your instance should receive or not receive new flows. 可以通过使运行状况探测发生故障来操控探测响应,使应用程序能够创建反压并限制向某个实例传送新流;或者,可以准备维护应用程序并开始清理方案。You can manipulate the probe response to allow your application to create backpressure and throttle delivery of new flows to an instance by failing the health probe or prepare for maintenance of your application and initiate draining your scenario. 使用标准负载均衡器时,探测停止信号始终允许继续处理 TCP 流,直到达到空闲超时或连接关闭为止。When using Standard Load Balancer, a probe down signal will always allow TCP flows to continue until idle timeout or connection closure.

对于 UDP 负载均衡,应从后端终结点生成自定义运行状况探测信号,并使用面向相应侦听器的 TCP、HTTP 或 HTTPS 运行状况探测来反映 UDP 应用程序的运行状况。For UDP load balancing, you should generate a custom health probe signal from the backend endpoint and use either a TCP, HTTP, or HTTPS health probe targeting the corresponding listener to reflect the health of your UDP application.

标准负载均衡器使用 HA 端口负载均衡规则时,将对所有端口进行负载均衡个运行状况探测响应必须反映整个实例的状态。When using HA Ports load-balancing rules with Standard Load Balancer, all ports are load balanced and a single health probe response must reflect the status of the entire instance.

不要通过接收运行状况探测的实例在 VNet 中的另一个实例上转换或代理某个运行状况探测,因为此配置可能导致方案中出现连锁故障。Do not translate or proxy a health probe through the instance that receives the health probe to another instance in your VNet as this configuration can lead to cascading failures in your scenario. 考虑以下方案:在负载均衡器资源后端池中部署一组第三方设备,以便为设备提供可伸缩性和冗余;配置运行状况探测来探测由第三方设备代理或转换成设备后面的其他虚拟机的端口。Consider the following scenario: a set of third-party appliances is deployed in the backend pool of a Load Balancer resource to provide scale and redundancy for the appliances and the health probe is configured to probe a port that the third-party appliance proxies or translates to other virtual machines behind the appliance. 如果探测用于将请求转换或代理到设备后面的其他虚拟机的同一端口,来自设备后面单个虚拟机的任何探测响应会将设备本身标记为完全停止。If you probe the same port you are using to translate or proxy requests to the other virtual machines behind the appliance, any probe response from a single virtual machine behind the appliance will mark the appliance itself dead. 此配置可能导致整个应用程序方案因设备后面的单个后端终结点而发生连锁故障。This configuration can lead to a cascading failure of the entire application scenario as a result of a single backend endpoint behind the appliance. 触发器可能是一种间歇性探测故障,该故障导致负载均衡器将原始目标(设备实例)标记为停止,从而可能禁用整个应用程序方案。The trigger can be an intermittent probe failure that will cause Load Balancer to mark down the original destination (the appliance instance) and in turn can disable your entire application scenario. 请改为探测设备本身的运行状况。Probe the health of the appliance itself instead. 选择用于确定运行状况信号的探测是网络虚拟设备 (NVA) 方案的重要考虑因素,必须咨询应用程序供应商,以了解哪种运行状况信号适合此类方案。The selection of the probe to determine the health signal is an important consideration for network virtual appliances (NVA) scenarios and you must consult your application vendor for what the appropriate health signal is for such scenarios.

如果在防火墙策略中不允许探测的源 IP,运行状况探测将会失败,因为它无法访问实例。If you don't allow the source IP of the probe in your firewall policies, the health probe will fail as it is unable to reach your instance. 而由于发生运行状况探测失败,负载均衡器会将实例标记为关闭。In turn, Load Balancer will mark down your instance due to the health probe failure. 这种错误的配置可能导致负载均衡的应用程序方案失败。This misconfiguration can cause your load balanced application scenario to fail.

要使负载均衡器的运行状况探测将实例标记为运行,必须在任何 Azure 网络安全组和本地防火墙策略中允许此 IP 地址。For Load Balancer's health probe to mark up your instance, you must allow this IP address in any Azure network security groups and local firewall policies. 默认情况下,每个网络安全组都包含服务标记 AzureLoadBalancer,以允许运行状况探测流量。By default, every network security group includes the service tag AzureLoadBalancer to permit health probe traffic.

若要测试运行状况探测故障或者将单个实例标记为停止,可以使用网络安全组显式阻止该运行状况探测(目标端口或源 IP),并模拟探测故障。If you wish to test a health probe failure or mark down an individual instance, you can use a network security groups to explicitly block the health probe (destination port or source IP) and simulate the failure of a probe.

不要使用 Azure 拥有的包含 168.63.129.16 的 IP 地址范围来配置 VNet。Do not configure your VNet with the Azure owned IP address range that contains 168.63.129.16. 这种配置与运行状况探测的 IP 地址冲突,可能导致方案失败。Such configurations will collide with the IP address of the health probe and can cause your scenario to fail.

如果 VM 上有多个接口,则需要确保能够响应收到请求的接口上的探测。If you have multiple interfaces on your VM, you need to insure you respond to the probe on the interface you received it on. 可能需要根据每个接口,对 VM 中的此地址进行源网络地址转换。You may need to source network address translate this address in the VM on a per interface basis.

不要启用 TCP 时间戳Do not enable TCP timestamps. 启用 TCP 时间戳可能导致运行状况探测因 VM 来宾 OS TCP 堆栈删除 TCP 数据包而失败,从而导致负载均衡器将相应的终结点标记为停止。Enabling TCP timestamps can cause health probes to fail due to TCP packets being dropped by the VM's guest OS TCP stack, which results in Load Balancer marking down the respective endpoint. 默认情况下,TCP 时间戳在安全强化的 VM 映像中定期启用,必须将其禁用。TCP timestamps are routinely enabled by default on security hardened VM images and must be disabled.

监视Monitoring

公共和内部标准负载均衡器通过 Azure Monitor 将每个终结点和后端终结点运行状况探测状态公开为多维指标。Both public and internal Standard Load Balancer expose per endpoint and backend endpoint health probe status as multi-dimensional metrics through Azure Monitor. 这些指标可由其他 Azure 服务或合作伙伴应用程序使用。These metrics can be consumed by other Azure services or partner applications.

限制Limitations

  • HTTPS 探测不支持使用客户端证书的相互身份验证。HTTPS probes do not support mutual authentication with a client certificate.
  • 启用 TCP 时间戳后,运行状况探测将会失败。Health probes will fail when TCP timestamps are enabled.

后续步骤Next steps