应用程序网关高流量支持Application Gateway high traffic support

备注

本文介绍了一些建议的准则,以帮助你设置应用程序网关,从而为可能产生的任何高流量处理额外流量。This article describes a few suggested guidelines to help you set up your Application Gateway to handle extra traffic for any high traffic volume that may occur. 警报阈值纯粹是建议,本质上是一般性的。The alert thresholds are purely suggestions and generic in nature. 用户可以根据其工作负荷和使用率预期来确定警报阈值。Users can determine alert thresholds based on their workload and utilization expectations.

可以使用配置了 Web 应用程序防火墙 (WAF) 的应用程序网关,以可缩放且安全的方式管理流向 Web 应用程序的流量。You can use Application Gateway with Web Application Firewall (WAF) for a scalable and secure way to manage traffic to your web applications.

请务必根据流量并提供少量缓冲来缩放应用程序网关,以便应对流量激增或峰值情况,最大程度地降低这些情况在 QoS 中的可能影响。It is important that you scale your Application Gateway according to your traffic and with a bit of a buffer so that you are prepared for any traffic surges or spikes and minimizing the impact that it may have in your QoS. 以下建议可帮助你设置部署有 WAF 的应用程序网关来应对额外流量。The following suggestions help you set up Application Gateway with WAF to handle extra traffic.

有关应用程序网关提供的指标的完整列表,请查看指标文档Please check the metrics documentation for the complete list of metrics offered by Application Gateway. 请查看 Azure 门户中的将指标可视化Azure Monitor 文档,了解如何为指标设置警报。See visualize metrics in the Azure portal and the Azure monitor documentation on how to set alerts for metrics.

针对应用程序网关 v1 SKU (Standard/WAF SKU) 的缩放Scaling for Application Gateway v1 SKU (Standard/WAF SKU)

根据高峰 CPU 使用情况设置实例计数Set your instance count based on your peak CPU usage

如果你使用的是 v1 SKU 网关,则可将应用程序网关设置为最多 32 个实例,以便进行缩放。If you are using a v1 SKU gateway, you’ll have the ability to set your Application Gateway up to 32 instances for scaling. 请检查过去一个月内应用程序网关的 CPU 使用率是否有超过 80% 的峰值情况,该使用率是作为监视指标提供的。Check your Application Gateway’s CPU utilization in the past one month for any spikes above 80%, it is available as a metric for you to monitor. 建议你根据高峰使用情况并提供 10% 到 20% 的用于应对任何流量峰值的额外缓冲来设置实例计数。It is recommended that you set your instance count according to your peak usage and with a 10% to 20% additional buffer to account for any traffic spikes.

V1 CPU 使用率指标

请使用 v2 SKU 而不是 v1,因为前者具有自动缩放功能,且性能更有优势Use the v2 SKU over v1 for its autoscaling capabilities and performance benefits

v2 SKU 提供自动缩放功能,确保应用程序网关能够随着流量的增加而纵向扩展。The v2 SKU offers autoscaling to ensure that your Application Gateway can scale up as traffic increases. 与 v1 相比,它还提供其他重要性能优势,例如,TLS 卸载性能要高出 5 倍、部署和更新时间更快、支持区域冗余等。It also offers other significant performance benefits, such as 5x better TLS offload performance, quicker deployment and update times, zone redundancy, and more when compared to v1. 有关详细信息,请参阅我们的 v2 文档。另请参阅我们的 v1 到 v2 迁移文档,了解如何将现有的 v1 SKU 网关迁移到 v2 SKU。For more information, see our v2 documentation and see our v1 to v2 migration documentation to learn how to migrate your existing v1 SKU gateways to v2 SKU.

针对应用程序网关 v2 SKU (Standard_v2/WAF_v2 SKU) 的自动缩放Autoscaling for Application Gateway v2 SKU (Standard_v2/WAF_v2 SKU)

将最大实例计数设置为最大可能值 (125)Set maximum instance count to the maximum possible (125)

对于应用程序网关 v2 SKU,将最大实例计数设置为最大可能值 125 可使应用程序网关按需横向扩展。For Application Gateway v2 SKU, setting the maximum instance count to the maximum possible value of 125 allows the Application Gateway to scale out as needed. 这样,应用程序网关就能处理应用程序中可能出现的流量增大情形。This allows it to handle the possible increase in traffic to your applications. 你只需为使用的容量单位 (CU) 付费。You will only be charged for the Capacity Units (CUs) you use.

请确保检查子网大小和子网中可用的 IP 地址计数,并基于该检查结果设置最大实例计数。Make sure to check your subnet size and available IP address count in your subnet and set your maximum instance count based on that. 如果子网没有足够的空间来容纳,则必须在具有足够容量的相同或不同子网中重新创建网关。If your subnet doesn’t have enough space to accommodate, you will have to re-create your gateway in the same or different subnet which has enough capacity.

V2 自动缩放配置

根据平均计算单位用量设置最小实例计数Set your minimum instance count based on your average Compute Unit usage

对于应用程序网关 v2 SKU,自动缩放需要 6 到 7 分钟的时间来横向扩展并预配已做好接收流量准备的其他实例集。For Application Gateway v2 SKU, autoscaling takes six to seven minutes to scale out and provision additional set of instances ready to take traffic. 在该时间之前,如果流量出现短暂高峰,则现有的网关实例可能会受到压力,这可能会导致意外的延迟或流量损失。Until then, if there are short spikes in traffic, your existing gateway instances might get under stress and this may cause unexpected latency or loss of traffic.

建议将最小实例计数设置为最佳级别。It is recommended that you set your minimum instance count to an optimal level. 例如,如果你需要 50 个实例来处理负载高峰期的流量,则最好是将最小实例计数设置为 25 到 30,而不是设置为 <10,这样,即使出现流量短期突发性增加的情况,应用程序网关也能处理它,并提供足够的时间以便自动缩放功能响应和生效。For example, if you require 50 instances to handle the traffic at peak load, then setting the minimum 25 to 30 is a good idea rather than at <10 so that even when there are short bursts of traffic, Application Gateway would be able to handle it and give enough time for autoscaling to respond and take effect.

检查过去一个月的计算单位指标。Check your Compute Unit metric for the past one month. 计算单位指标是网关的 CPU 使用率的表示形式。你可以根据高峰用量除以 10 得到的值来设置所需的最小实例数。Compute unit metric is a representation of your gateway's CPU utilization and based on your peak usage divided by 10, you can set the minimum number of instances required. 请注意,1 个应用程序网关实例最少可处理 10 个计算单位Note that 1 application gateway instance can handle a minimum of 10 compute units

V2 计算单位指标

针对应用程序网关 v2 SKU (Standard_v2/WAF_v2) 的手动缩放Manual scaling for Application Gateway v2 SKU (Standard_v2/WAF_v2)

根据高峰计算单位用量设置实例计数Set your instance count based on your peak Compute Unit usage

与自动缩放不同,在手动缩放中,必须根据流量要求手动设置应用程序网关的实例数。Unlike autoscaling, in manual scaling, you must manually set the number of instances of your application gateway based on the traffic requirements. 建议你根据高峰使用情况并提供 10% 到 20% 的用于应对任何流量峰值的额外缓冲来设置实例计数。It is recommended that you set your instance count according to your peak usage and with a 10% to 20% additional buffer to account for any traffic spikes. 例如,如果流量在高峰时需要 50 个实例,请预配 55 到 60 个实例,以应对可能会意外出现的流量峰值。For example, if your traffic requires 50 instances at peak, provision 55 to 60 instances to handle unexpected traffic spikes that may occur.

检查过去一个月的计算单位指标。Check your Compute Unit metric for the past one month. 计算单位指标是网关的 CPU 使用率的表示形式。你可以根据高峰用量除以 10 得到的值来设置所需的实例数,因为 1 个应用程序网关实例可以处理最少 10 个计算单位Compute unit metric is a representation of your gateway's CPU utilization and based on your peak usage divided by 10, you can set the number of instances required, since 1 application gateway instance can handle a minimum of 10 compute units

监视和警报Monitoring and alerting

若要获得有关流量或使用率异常的通知,可以针对特定指标设置警报。To get notified of any traffic or utilization anomalies, you can set up alerts on certain metrics. 有关应用程序网关提供的指标的完整列表,请查看指标文档See metrics documentation for the complete list of metrics offered by Application Gateway. 请查看 Azure 门户中的将指标可视化Azure Monitor 文档,了解如何为指标设置警报。See visualize metrics in the Azure portal and the Azure monitor documentation on how to set alerts for metrics.

针对应用程序网关 v1 SKU (Standard/WAF) 的警报Alerts for Application Gateway v1 SKU (Standard/WAF)

在平均 CPU 使用率超出 80% 时发出警报Alert if average CPU utilization crosses 80%

正常情况下,CPU 使用率不应经常超过 90%,因为这可能导致托管在应用程序网关后面的网站中出现延迟,并破坏客户端体验。Under normal conditions, CPU usage should not regularly exceed 90%, as this may cause latency in the websites hosted behind the Application Gateway and disrupt the client experience. 可以通过修改应用程序网关的配置(具体方法是:增加实例计数和/或转换到更大的 SKU 大小)来间接控制或改进 CPU 使用率。You can indirectly control or improve CPU utilization by modifying the configuration of the Application Gateway by increasing the instance count or by moving to a larger SKU size or doing both. 设置一个在平均 CPU 使用率指标超出 80% 时会发出的警报。Set an alert if the CPU utilization metric goes above 80% average.

在不正常的主机计数超出阈值时发出警报Alert if Unhealthy host count crosses threshold

此指标表示应用程序网关无法成功探测的后端服务器数。This metric indicates number of backend servers that application gateway is unable to probe successfully. 这将捕获应用程序网关实例无法连接到后端的问题。This will catch issues where Application gateway instances are unable to connect to the backend. 如果此数字超出后端容量的 20%,则会发出警报。Alert if this number goes above 20% of backend capacity. 例如E.g. 如果你目前在后端池中有 30 个后端服务器,则设置一个在不正常主机计数超过 6 的情况下就会发出的警报。if currently you have 30 backend servers in their backend pool, set an alert if the unhealthy host count goes above 6.

在响应状态(4xx、5xx)超出阈值时发出警报Alert if Response status (4xx, 5xx) crosses threshold

在应用程序网关响应状态为 4xx 或 5xx 时创建警报。Create alert when Application Gateway response status is 4xx or 5xx. 可能会由于暂时性问题而偶然出现 4xx 或 5xx 响应。There could be occasional 4xx or 5xx response seen due to transient issues. 应该在生产环境中观察网关,以确定警报的静态阈值或对其使用动态阈值。You should observe the gateway in production to determine static threshold or use dynamic threshold for the alert.

在失败的请求数超出阈值时发出警报Alert if Failed requests crosses threshold

在“失败的请求数”指标超出阈值时创建警报。Create alert when Failed requests metric crosses threshold. 应该在生产环境中观察网关,以确定警报的静态阈值或对其使用动态阈值。You should observe the gateway in production to determine static threshold or use dynamic threshold for the alert.

示例:设置在过去 5 分钟内失败的请求数超出 100 时会发出的警报Example: Setting up an alert for more than 100 failed requests in the last 5 minutes

此示例演示如何使用 Azure 门户设置在过去 5 分钟内失败的请求计数超出 100 时会发出的警报。This example shows you how to use the Azure portal to set up an alert when the failed request count in the last 5 minutes is more than 100.

  1. 导航到应用程序网关。Navigate to your Application Gateway.
  2. 在左侧面板中,选择“监视”选项卡下的“指标” 。On the left panel, select Metrics under the Monitoring tab.
  3. 添加一个针对“失败的请求数”的指标。Add a metric for Failed requests.
  4. 单击“新建警报规则”并定义条件和操作Click on New alert rule and define your condition and actions
  5. 单击“创建警报规则”以创建并启用警报Click on Create alert rule to create and enable the alert

V2 - 创建警报

针对应用程序网关 v2 SKU (Standard_v2/WAF_v2) 的警报Alerts for Application Gateway v2 SKU (Standard_v2/WAF_v2)

在计算单位使用量超出 75% 的平均使用量时发出警报Alert if Compute Unit utilization crosses 75% of average usage

计算单位是对应用程序网关的计算使用量的度量。Compute unit is the measure of compute utilization of your Application Gateway. 检查过去 1 个月内的平均计算单位使用量,并设置在计算单位使用量超出 75% 的平均使用量时会发出的警报。Check your average compute unit usage in the last one month and set alert if it crosses 75% of it. 例如,如果平均使用量为 10 个计算单位,请设置在使用了 7.5 个计算单位时会发出的警报。For example, if your average usage is 10 compute units, set an alert on 7.5 CUs. 这会在用量不断增大时发出警报,并让你从容应对。This alerts you if usage is increasing and gives you time to respond. 如果你认为这种流量将会持续,可以提高最小值,以提醒自己该流量可能会不断增大。You can raise the minimum if you think this traffic will be sustained to alert you that traffic may be increasing. 根据需要,按照上述缩放建议进行横向扩展。Follow the scaling suggestions above to scale out as necessary.

示例:设置在达到平均 CU 用量的 75% 时发出警报Example: Setting up an alert on 75% of average CU usage

此示例演示如何使用 Azure 门户设置在达到平均 CU 用量的 75% 时发出警报。This example shows you how to use the Azure portal to set up an alert when 75% of average CU usage is reached.

  1. 导航到应用程序网关。Navigate to your Application Gateway.
  2. 在左侧面板中,选择“监视”选项卡下的“指标” 。On the left panel, select Metrics under the Monitoring tab.
  3. 为“平均当前计算单位数”添加一个指标。Add a metric for Average Current Compute Units.
  4. 如果已将最小实例计数设置为平均 CU 用量,请继续设置在使用了最小实例数的 75% 时发出警报。If you've set your minimum instance count to be your average CU usage, go ahead and set an alert when 75% of your minimum instances are in use. 例如,如果平均用量为 10 个 CU,则设置在使用了 7.5 个 CU 时发出警报。For example, if your average usage is 10 CUs, set an alert on 7.5 CUs. 这会在用量不断增大时发出警报,并让你从容应对。This alerts you if usage is increasing and gives you time to respond. 如果你认为这种流量将会持续,可以提高最小值,以提醒自己该流量可能会不断增大。You can raise the minimum if you think this traffic will be sustained to alert you that traffic may be increasing.

V2 计算单位警报

备注

你可以根据自己对潜在流量高峰的敏感程度,设置在 CU 用量百分比降低或提高时发出警报。You can set the alert to occur at a lower or higher CU utilization percentage depending on how sensitive you want to be to potential traffic spikes.

在容量单位使用量超出 75% 的高峰使用量时发出警报Alert if Capacity Unit utilization crosses 75% of peak usage

容量单位根据吞吐量、计算和连接计数来表示总体网关使用量。Capacity units represent overall gateway utilization in terms of throughput, compute, and connection count. 检查过去 1 个月内的最大容量单位使用量,并设置在容量单位使用量超出 75% 的高峰使用量时会发出的警报。Check your maximum capacity unit usage in the last one month and set alert if it crosses 75% of it. 例如,如果最大使用量为 100 个容量单位,请设置在使用了 75 个容量单位时会发出的警报。For example, if your maximum usage is 100 capacity units, set an alert on 75 CUs. 根据需要,按照上述两项建议进行横向扩展。Follow the above two suggestions to scale out, as necessary.

在不正常的主机计数超出阈值时发出警报Alert if Unhealthy host count crosses threshold

此指标表示应用程序网关无法成功探测的后端服务器数。This metric indicates number of backend servers that application gateway is unable to probe successfully. 这将捕获应用程序网关实例无法连接到后端的问题。This will catch issues where Application gateway instances are unable to connect to the backend. 如果此数字超出后端容量的 20%,则会发出警报。Alert if this number goes above 20% of backend capacity. 例如E.g. 如果你目前在后端池中有 30 个后端服务器,则设置一个在不正常主机计数超过 6 的情况下就会发出的警报。if currently you have 30 backend servers in their backend pool, set an alert if the unhealthy host count goes above 6.

在响应状态(4xx、5xx)超出阈值时发出警报Alert if Response status (4xx, 5xx) crosses threshold

在应用程序网关响应状态为 4xx 或 5xx 时创建警报。Create alert when Application Gateway response status is 4xx or 5xx. 可能会由于暂时性问题而偶然出现 4xx 或 5xx 响应。There could be occasional 4xx or 5xx response seen due to transient issues. 应该在生产环境中观察网关,以确定警报的静态阈值或对其使用动态阈值。You should observe the gateway in production to determine static threshold or use dynamic threshold for the alert.

在失败的请求数超出阈值时发出警报Alert if Failed requests crosses threshold

在“失败的请求数”指标超出阈值时创建警报。Create alert when Failed requests metric crosses threshold. 应该在生产环境中观察网关,以确定警报的静态阈值或对其使用动态阈值。You should observe the gateway in production to determine static threshold or use dynamic threshold for the alert.

在后端最后一个字节的响应时间超出阈值时发出警报Alert if Backend last byte response time crosses threshold

此指标表示从开始与后端服务器建立连接到收到响应正文的最后一个字节的时间间隔。This metric indicates the time interval between start of establishing a connection to backend server and receiving the last byte of the response body. 创建一个在后端响应延迟比常规值高出某个阈值时会发出的警报。Create an alert if the backend response latency is more that certain threshold from usual. 例如,可以将此项设置为在后端响应延迟比常规值高出 30% 以上时发出警报。For example, set this to be alerted when backend response latency increases by more than 30% from the usual value.

在应用程序网关总时间超出阈值时发出警报Alert if Application Gateway total time crosses threshold

此间隔时间是根据从应用程序网关收到 HTTP 请求的第一个字节的时间,到将最后一个响应字节发送到客户端的时间计算的。This is the interval from the time when Application Gateway receives the first byte of the HTTP request to the time when the last response byte has been sent to the client. 应该创建一个在后端响应延迟比常规值高出某个阈值时会发出的警报。Should create an alert if the backend response latency is more that certain threshold from usual. 例如,用户可以将此项设置为在总时间延迟比常规值高出 30% 以上时发出警报。For example, they can set this to be alerted when total time latency increases by more than 30% from the usual value.

设置提供地理筛选和机器人防护的 WAF 来阻止攻击Set up WAF with geo filtering and bot protection to stop attacks

如果需要在应用程序的前面使用额外的安全层,请为 WAF 功能使用应用程序网关 WAF_v2 SKU。If you want an extra layer of security in front of your application, use the Application Gateway WAF_v2 SKU for WAF capabilities. 可将 v2 SKU 配置为仅允许从给定的国家/地区访问你的应用程序。You can configure the v2 SKU to only allow access to your applications from a given country/region or countries/regions. 设置一个 WAF 自定义规则,以基于地理位置明确允许或阻止流量。You set up a WAF custom rule to explicitly allow or block traffic based on the geo location. 有关详细信息,请参阅如何通过 PowerShell 在应用程序网关 WAF_v2 SKU 中配置自定义规则For more information, see how to configure custom rules on Application Gateway WAF_v2 SKU through PowerShell.

启用机器人防护以阻止已知恶意的机器人。Enable bot protection to block known bad bots. 这应该可以减少进入应用程序的流量。This should reduce the amount of traffic getting to your application. 有关详细信息,请参阅机器人防护和设置说明For more information, see bot protection with set up instructions.

在应用程序网关和 WAF 上启用诊断Turn on diagnostics on Application Gateway and WAF

通过诊断日志,你可以查看防火墙日志、性能日志和访问日志。Diagnostic logs allow you to view firewall logs, performance logs, and access logs. 可在 Azure 中使用这些日志来对应用程序网关进行管理和故障排除。You can use these logs in Azure to manage and troubleshoot Application Gateways. 有关详细信息,请参阅我们的诊断文档For more information, see our diagnostics documentation.

设置 TLS 策略以进一步提高安全性Set up an TLS policy for extra security

请确保使用最新的 TLS 策略版本 (AppGwSslPolicy20170401S)。Ensure you're using the latest TLS policy version (AppGwSslPolicy20170401S). 此版本强制实施 TLS 1.2 和更强的密码。This enforces TLS 1.2 and stronger ciphers. 有关详细信息,请参阅通过 PowerShell 配置 TLS 策略版本和加密套件For more information, see configuring TLS policy versions and cipher suites via PowerShell.