如何验证到达虚拟网络的 VPN 吞吐量How to validate VPN throughput to a virtual network

通过 VPN 网关连接,可以在 Azure 内的虚拟网络与本地 IT 基础结构之间创建安全的跨界连接。A VPN gateway connection enables you to establish secure, cross-premises connectivity between your Virtual Network within Azure and your on-premises IT infrastructure.

本文将演示如何验证从本地资源到达 Azure 虚拟机 (VM) 的网络吞吐量。This article shows how to validate network throughput from the on-premises resources to an Azure virtual machine (VM).

备注

本文旨在帮助诊断并解决常见的问题。This article is intended to help diagnose and fix common issues. 如果使用以下信息无法解决问题,请与支持人员联系If you're unable to solve the issue by using the following information, contact support.

概述Overview

VPN 网关连接涉及以下组件:The VPN gateway connection involves the following components:

  • 本地 VPN 设备(请查看已验证 VPN 设备的列表。)On-premises VPN device (View a list of validated VPN devices.)
  • 公共 InternetPublic internet
  • Azure VPN 网关Azure VPN gateway
  • Azure VMAzure VM

下图显示的是通过 VPN 建立的从本地网络至 Azure 虚拟网络的逻辑连接。The following diagram shows the logical connectivity of an on-premises network to an Azure virtual network through VPN.

利用 VPN 建立的从客户网络至 MSFT 网络的逻辑连接

计算最大的预期流入/流出量Calculate the maximum expected ingress/egress

  1. 确定应用程序的基准吞吐量需求。Determine your application's baseline throughput requirements.
  2. 确定 Azure VPN 网关的吞吐量限制。Determine your Azure VPN gateway throughput limits. 如需帮助,请参阅关于 VPN 网关的“网关 SKU”部分。For help, see the "Gateway SKUs" section of About VPN Gateway.
  3. 确定与 VM 大小相应的 Azure VM 吞吐量指南Determine the Azure VM throughput guidance for your VM size.
  4. 确定 Internet 服务提供商 (ISP) 的带宽。Determine your Internet Service Provider (ISP) bandwidth.
  5. 使用 VM、VPN 网关或 ISP 的最小带宽来计算预期的吞吐量;其度量方式是兆位/每秒 (/) 除以八 (8)。Calculate your expected throughput by taking the least bandwidth of either the VM, VPN Gateway, or ISP; which is measured in Megabits-per-second (/) divided by eight (8).

如果计算得出的吞吐量无法满足应用程序的基准吞吐量需求,则必须提高已被确定为瓶颈的资源的带宽。If your calculated throughput does not meet your application's baseline throughput requirements, you must increase the bandwidth of the resource that you identified as the bottleneck. 若要调整 Azure VPN 网关的大小,请参阅更改网关 SKUTo resize an Azure VPN Gateway, see Changing a gateway SKU. 若要调整虚拟机的大小,请参阅调整 VM 的大小To resize a virtual machine, see Resize a VM. 如果 Internet 的带宽不及预期,也可联系 ISP。If you are not experiencing the expected Internet bandwidth, you may also contact your ISP.

备注

VPN 网关吞吐量是所有站点到站点\VNET 到 VNET 或点到站点连接的聚合。VPN Gateway throughput is an aggregate of all Site-to-Site\VNET-to-VNET, or Point-to-Site connections.

使用性能工具验证网络吞吐量Validate network throughput by using performance tools

此验证应在非高峰时段执行,因为测试期间的 VPN 隧道吞吐量饱和度无法给出准确的结果。This validation should be performed during non-peak hours, as VPN tunnel throughput saturation during testing does not give accurate results.

此测试使用 iPerf 工具来实施,此工具在 Windows 和 Linux 上均可使用,并且有“客户端”和“服务器”两种模式。The tool we use for this test is iPerf, which works on both Windows and Linux and has both client and server modes. 对于 Windows VM,其限速为 3Gbps。It is limited to 3Gbps for Windows VMs.

此工具不会对磁盘执行任何读/写操作。This tool does not perform any read/write operations to disk. 它只会生成从一端至另一端的自生成 TCP 流量。It solely produces self-generated TCP traffic from one end to the other. 它已生成的统计信息基于各种旨在测量客户端和服务器节点间可用带宽的试验。It generates statistics based on experimentation that measures the bandwidth available between client and server nodes. 在两个节点间进行测试时,一个节点充当服务器,另一个节点充当客户端。When testing between two nodes, one node acts as the server, and the other node acts as a client. 完成此测试后,建议对调两个节点的角色,以测试它们的上传和下载吞吐量。Once this test is completed, we recommend that you reverse the roles of the nodes to test both upload and download throughput on both nodes.

下载 iPerfDownload iPerf

下载 iPerfDownload iPerf. 有关详情,请参阅 iPerf 文档For details, see iPerf documentation.

备注

本文所讨论的第三方产品由独立于 Microsoft 的公司生产。The third-party products discussed in this article are manufactured by companies that are independent of Microsoft. Microsoft 对这些产品的性能和可靠性不作任何明示或默示担保。Microsoft makes no warranty, implied or otherwise, about the performance or reliability of these products.

运行 iPerf (iperf3.exe)Run iPerf (iperf3.exe)

  1. 启用允许流量的 NSG/ACL 规则(适用于在 Azure VM 上进行公共 IP 地址测试)。Enable an NSG/ACL rule allowing the traffic (for public IP address testing on Azure VM).

  2. 在两个节点上,为端口 5001 启用防火墙例外。On both nodes, enable a firewall exception for port 5001.

    Windows: 以管理员身份运行以下命令:Windows: Run the following command as an administrator:

    netsh advfirewall firewall add rule name="Open Port 5001" dir=in action=allow protocol=TCP localport=5001
    

    若要在测试完成后删除规则,请运行此命令:To remove the rule when testing is complete, run this command:

    netsh advfirewall firewall delete rule name="Open Port 5001" protocol=TCP localport=5001
    

    Azure Linux: Azure Linux 映像具有限制性较低的防火墙。Azure Linux: Azure Linux images have permissive firewalls. 如果有应用程序在侦听某个端口,则流量会被允许通过。If there is an application listening on a port, the traffic is allowed through. 受保护的自定义映像可能需要显式打开端口。Custom images that are secured may need ports opened explicitly. 常见的 Linux OS 层防火墙包括 iptablesufwfirewalldCommon Linux OS-layer firewalls include iptables, ufw, or firewalld.

  3. 在服务器节点上,更改为从中提取 iperf3.exe 的目录。On the server node, change to the directory where iperf3.exe is extracted. 然后,在服务器模式下运行 iPerf 并将其设置为侦听端口 5001,如以下命令所示:Then run iPerf in server mode, and set it to listen on port 5001 as the following commands:

    cd c:\iperf-3.1.2-win65
    
    iperf3.exe -s -p 5001
    

    备注

    可以根据环境中的特定防火墙限制自定义端口 5001。Port 5001 is customizable to account for particular firewall restrictions in your environment.

  4. 在客户端节点上,转到从中提取 iperf 工具的目录,并运行以下命令:On the client node, change to the directory where iperf tool is extracted and then run the following command:

    iperf3.exe -c <IP of the iperf Server> -t 30 -p 5001 -P 32
    

    客户端将端口 5001 上 30 秒的流量引导到服务器。The client is directing thirty seconds of traffic on port 5001, to the server. 标志“-P”表明我们正同时发出 32 个连至服务器节点的连接。The flag '-P ' indicates that we are making 32 simultaneous connections to the server node.

    以下屏幕显示了本示例中的输出:The following screen shows the output from this example:

    输出

  5. (可选)若要保留测试结果,请运行以下命令:(OPTIONAL) To preserve the testing results, run this command:

    iperf3.exe -c IPofTheServerToReach -t 30 -p 5001 -P 32  >> output.txt
    
  6. 完成上述步骤后,请调换角色以使服务器节点变为客户端节点(反之亦然),然后执行相同的步骤。After completing the previous steps, execute the same steps with the roles reversed, so that the server node will now be the client node, and vice-versa.

备注

Iperf 不是唯一工具。Iperf is not the only tool. NTTTCP 是用于测试的备用解决方案NTTTCP is an alternative solution for testing.

测试运行 Windows 的 VMTest VMs running Windows

将 Latte.exe 加载到 VMLoad Latte.exe onto the VMs

下载最新版本的 Latte.exeDownload the latest version of Latte.exe

考虑将 Latte.exe 放在单独的文件夹中,例如 c:\toolsConsider putting Latte.exe in separate folder, such as c:\tools

允许 Latte.exe 通过 Windows 防火墙Allow Latte.exe through the Windows firewall

在接收端上的 Windows 防火墙中创建“允许”规则,以允许 Latte.exe 流量抵达。On the receiver, create an Allow rule on the Windows Firewall to allow the Latte.exe traffic to arrive. 最简单的方法是按名称允许整个 Latte.exe 程序,而不是允许特定的 TCP 端口入站。It's easiest to allow the entire Latte.exe program by name rather than to allow specific TCP ports inbound.

如下所示允许 Latte.exe 通过 Windows 防火墙Allow Latte.exe through the Windows Firewall like this

netsh advfirewall firewall add rule program=<PATH>\latte.exe name="Latte" protocol=any dir=in action=allow enable=yes profile=ANY

例如,如果已将 Latte.exe 复制到“c:\tools”文件夹中,则此命令为For example, if you copied latte.exe to the "c:\tools" folder, this would be the command

netsh advfirewall firewall add rule program=c:\tools\latte.exe name="Latte" protocol=any dir=in action=allow enable=yes profile=ANY

运行延迟测试Run latency tests

在接收端启动 latte.exe(从 CMD 运行,而不要从 PowerShell 运行):Start latte.exe on the RECEIVER (run from CMD, not from PowerShell):

latte -a <Receiver IP address>:<port> -i <iterations>

大约 65000 次迭代就足以返回代表性的结果。Around 65k iterations is long enough to return representative results.

可输入任意可用端口号。Any available port number is fine.

如果 VM 的 IP 地址为 10.0.0.4,则命令如下If the VM has an IP address of 10.0.0.4, it would look like this

latte -c -a 10.0.0.4:5005 -i 65100

在发送端启动 latte.exe(从 CMD 运行,而不要从 PowerShell 运行)Start latte.exe on the SENDER (run from CMD, not from PowerShell)

latte -c -a <Receiver IP address>:<port> -i <iterations>

生成的命令与接收端上的命令相同,只是添加了“-c”来指示这是“客户端”或发送端The resulting command is the same as on the receiver except with the addition of "-c" to indicate that this is the "client" or sender

latte -c -a 10.0.0.4:5005 -i 65100

等待结果。Wait for the results. 该命令可能需要几分钟时间才能完成,具体取决于 VM 之间的距离。Depending on how far apart the VMs are, it could take a few minutes to complete. 考虑先运行较少的迭代次数以使测试成功,然后再运行较长的测试。Consider starting with fewer iterations to test for success before running longer tests.

测试运行 Linux 的 VMTest VMs running Linux

使用 SockPerf 测试 VM。Use SockPerf to test VMs.

在 VM 上安装 SockPerfInstall SockPerf on the VMs

在 Linux VM(发送端和接收端)上,运行以下命令以在 VM 上准备 SockPerf:On the Linux VMs (both SENDER and RECEIVER), run these commands to prepare SockPerf on your VMs:

CentOS/RHEL - 安装 GIT 和其他有用的工具CentOS / RHEL - Install GIT and other helpful tools

sudo yum install gcc -y -q sudo yum install git -y -q sudo yum install gcc-c++ -y sudo yum install ncurses-devel -y sudo yum install -y automake

Ubuntu - 安装 GIT 和其他有用的工具Ubuntu - Install GIT and other helpful tools

sudo apt-get install build-essential -y sudo apt-get install git -y -q sudo apt-get install -y autotools-dev sudo apt-get install -y automake

Bash - 所有分发版Bash - all

从 bash 命令行(假设已安装 git)From bash command line (assumes git is installed)

git clone https://github.com/mellanox/sockperf cd sockperf/ ./autogen.sh ./configure --prefix=

Make 速度较慢,可能需要几分钟时间Make is slower, may take several minutes

make

Make install 速度较快Make install is fast

sudo make install

在 VM 上运行 SockPerfRun SockPerf on the VMs

安装后的示例命令。Sample commands after installation. 服务器/接收端 - 假设服务器 IP 为10.0.0.4Server/Receiver - assumes server's IP is 10.0.0.4

sudo sockperf sr --tcp -i 10.0.0.4 -p 12345 --full-rtt

客户端 - 假设服务器 IP 为 10.0.0.4Client - assumes server's IP is 10.0.0.4

sockperf ping-pong -i 10.0.0.4 --tcp -m 1400 -t 101 -p 12345 --full-rtt

备注

在 VM 与网关之间执行吞吐量测试过程中,请确保没有中间跃点(例如虚拟设备)。Make sure there are no intermediate hops (e.g. Virtual Appliance) during the throughput testing in between the VM and Gateway. 如果上述 iPERF/NTTTCP 测试返回的结果不佳(在总体吞吐量方面),请参阅以下文章,了解此问题的可能性根本原因是哪些重要因素造成的: https://docs.azure.cn/virtual-network/virtual-network-tcpip-performance-tuningIf there are poor results (in terms of overall throughput) coming from the iPERF/NTTTCP tests above, please refer to the following article to understand the key factors behind the possible root causes of the problem: https://docs.azure.cn/virtual-network/virtual-network-tcpip-performance-tuning

具体而言,在执行这些测试期间同时从客户端和服务器收集的数据包捕获跟踪(Wireshark/网络监视器)有助于对不良性能进行评估。In particular, analysis of packet capture traces (Wireshark/Network Monitor) collected in parallel from client and server during those tests will help in the assessments of bad performance. 这些跟踪可能包括丢包、高延迟、MTU 大小问题、These traces can include packet loss, high latency, MTU size. 碎片、TCP 0 窗口、失序片段等。fragmentation, TCP 0 Window, Out of Order fragments, and so on.

解决文件复制速度缓慢问题Address slow file copy issues

即使使用上述步骤评估得出的总体吞吐量(iPERF/NTTTCP/等)良好,在使用 Windows 资源管理器或通过 RDP 会话拖放时,也仍可能会遇到文件复制速度缓慢的情况。Even if the overall throughput assessed with the previous steps (iPERF/NTTTCP/etc..) was good, you may experience slow file coping when either using Windows Explorer, or dragging and dropping through an RDP session. 此问题通常是由以下的一个或两个因素造成的:This problem is normally due to one or both of the following factors:

  • 文件复制应用程序(如 Windows 资源管理器和 RDP)在复制文件时没有使用多个线程。File copy applications, such as Windows Explorer and RDP, do not use multiple threads when copying files. 为了提高性能,请通过多线程文件复制应用程序(如 Richcopy)使用 16 或 32 个线程来复制文件。For better performance, use a multi-threaded file copy application such as Richcopy to copy files by using 16 or 32 threads. 若要更改 Richcopy 中的文件复制线程数目,请单击“操作” > “复制选项” > “文件复制” 。To change the thread number for file copy in Richcopy, click Action > Copy options > File copy.

    文件复制速度缓慢问题

    备注

    并非所有应用程序的工作方式都相同,此外,并非所有应用程序/进程都利用所有线程。Not all application work same, and not all application/process utilizes all the threads. 如果运行测试,可以看到某些线程是空的,不能提供准确的吞吐量结果。If you run the test, you could see some threads being empty and won't provide accurate throughput results. 若要检查应用程序文件的传输性能,请通过增加连续线程数来使用多线程,或减少线程数,以找到应用程序或文件传输的最佳吞吐量。To check your application file transfer performance, use multi-thread by increasing the # of thread in succession or decrease in order to find the optimal throughput of the application or file transfer.

  • VM 磁盘读/写速度不够快。Insufficient VM disk read/write speed. 有关详细信息,请参阅 Azure 存储故障排除For more information, see Azure Storage Troubleshooting.

本地设备上的对外接口On-premises device external facing interface

指定希望 Azure 通过本地网络网关上的 VPN 访问的本地范围的子网。Mentioned the subnets of on-premises ranges that you would like Azure to reach via VPN on Local Network Gateway. 同时,将 Azure 中的 VNET 地址空间定义为本地设备的相同地址空间。Simultaneously, define the VNET address space in Azure to the on-premises device.

  • 基于路由的网关:基于路由的 VPN 的策略或流量选择器配置为任意到任意(或通配符)。Route Based Gateway: The policy or traffic selector for route-based VPNs are configured as any-to-any (or wild cards).

  • 基于策略的网关:基于策略的 VPN 会根据本地网络和 Azure VNet 之间的地址前缀的各种组合,加密数据包并引导其通过 IPsec 隧道。Policy Based Gateway: Policy-based VPNs encrypt and direct packets through IPsec tunnels based on the combinations of address prefixes between your on-premises network and the Azure VNet. 通常会在 VPN 配置中将策略(或流量选择器)定义为访问列表。The policy (or Traffic Selector) is usually defined as an access list in the VPN configuration.

  • UsePolicyBasedTrafficSelector 连接:将“UsePolicyBasedTrafficSelectors”设置为 $True,此时会配置 Azure VPN 网关,以连接到基于策略的本地 VPN 防火墙。UsePolicyBasedTrafficSelector connections: ("UsePolicyBasedTrafficSelectors" to $True on a connection will configure the Azure VPN gateway to connect to policy-based VPN firewall on premises. 如果启用 PolicyBasedTrafficSelectors,则需确保对于本地网络(本地网关)前缀与 Azure 虚拟网络前缀的所有组合,VPN 设备都定义了与之匹配的(而不是任意到任意)流量选择器。If you enable PolicyBasedTrafficSelectors, you need to ensure your VPN device has the matching traffic selectors defined with all combinations of your on-premises network (local network gateway) prefixes to and from the Azure virtual network prefixes, instead of any-to-any.

不当的配置可能导致隧道中频繁断开连接、丢包、吞吐量不佳和延迟。Inappropriate configuration may lead to frequent disconnects within the tunnel, packet drops, bad throughput, and latency.

检查延迟Check latency

可使用以下工具来检查延迟:You can check latency by using the following tools:

  • WinMTRWinMTR
  • TCPTracerouteTCPTraceroute
  • pingpsping(这些工具能够很好地评估 RTT,但不能在所有情况下使用。)ping and psping (These tools can provide a good estimate of RTT, but they can't be used in all cases.)

检查延迟

如果在进入 MS 网络主干之前发现任一跃点出现较高的延迟峰值,可以在 Internet 服务提供商的配合下进一步展开调查。If you notice a high latency spike at any of the hops before entering MS Network backbone, you may want to proceed with further investigations with your Internet Service Provider.

如果在“msn.net”内部的跃点中发现了非同寻常的高延迟峰值,请联系 MS 支持部门进一步展开调查。If a large, unusual latency spike is noticed from hops within "msn.net", please contact MS support for further investigations.

后续步骤Next steps

有关详细信息或帮助,请查看以下链接:For more information or help, check out the following link: