在 Linux 虚拟机中设置 DPDKSet up DPDK in a Linux virtual machine

Azure 中的数据平面开发工具包 (DPDK) 提供了更快速的用户空间包处理框架,适用于性能密集型应用程序。Data Plane Development Kit (DPDK) on Azure offers a faster user-space packet processing framework for performance-intensive applications. 此框架会绕过虚拟机的内核网络堆栈。This framework bypasses the virtual machine's kernel network stack.

在使用内核网络堆栈的典型包处理中,进程是由中断指令驱动。In typical packet processing that uses the kernel network stack, the process is interrupt-driven. 当网络接口收到传入的包时,不仅有内核中断指令会处理包,还有上下文切换会从内核空间切换到用户空间。When the network interface receives incoming packets, there is a kernel interrupt to process the packet and a context switch from the kernel space to the user space. DPDK 消除了上下文切换和中断指令驱动方法,而是实现用户空间,以使用轮询模式驱动程序来加速包处理。DPDK eliminates context switching and the interrupt-driven method in favor of a user-space implementation that uses poll mode drivers for fast packet processing.

DPDK 由多组用户空间库构成,这些库提供对较低级别资源的访问权限。DPDK consists of sets of user-space libraries that provide access to lower-level resources. 这些资源包括硬件、逻辑核心、内存管理和网络接口卡的轮询模式驱动程序。These resources can include hardware, logical cores, memory management, and poll mode drivers for network interface cards.

DPDK 可以在支持多个操作系统分发版的 Azure 虚拟机中运行。DPDK can run on Azure virtual machines that are supporting multiple operating system distributions. DPDK 在驱动网络功能虚拟化实现方面提供与众不同的关键性能。DPDK provides key performance differentiation in driving network function virtualization implementations. 这些实现可采用网络虚拟设备 (NVA) 的形式,如虚拟路由器、防火墙、VPN、负载均衡器、演进包核心和拒绝服务 (DDoS) 应用程序。These implementations can take the form of network virtual appliances (NVAs), such as virtual routers, firewalls, VPNs, load balancers, evolved packet cores, and denial-of-service (DDoS) applications.

优势Benefit

提高每秒包数 (PPS):绕过内核并控制用户空间中的包可消除上下文切换,从而减少周期计数。Higher packets per second (PPS): Bypassing the kernel and taking control of packets in the user space reduces the cycle count by eliminating context switches. 同时,这还会提高 Azure Linux 虚拟机中每秒处理的包比率。It also improves the rate of packets that are processed per second in Azure Linux virtual machines.

支持的操作系统Supported operating systems

支持 Azure 库中的以下分发版:The following distributions from the Azure Gallery are supported:

Linux OSLinux OS 内核版本Kernel version
Ubuntu 16.04Ubuntu 16.04 4.15.0-1015-azure4.15.0-1015-azure
Ubuntu 18.04Ubuntu 18.04 4.15.0-1015-azure4.15.0-1015-azure
SLES 15SLES 15 4.12.14-5.5-azure4.12.14-5.5-azure
CentOS 7.5CentOS 7.5 3.10.0-862.3.3.el73.10.0-862.3.3.el7

自定义内核支持Custom kernel support

对于未列出的任何 Linux 内核版本,请参阅用于生成 Azure 优化 Linux 内核的修补程序For any Linux kernel version that's not listed, see Patches for building an Azure-tuned Linux kernel. 有关详细信息,还可以联系 Azure 支持For more information, you can also contact Azure Support.

区域支持Region support

所有 Azure 区域都支持 DPDK。All Azure regions support DPDK.

先决条件Prerequisites

必须在 Linux 虚拟机上启用加速网络。Accelerated networking must be enabled on a Linux virtual machine. 虚拟机应至少有两个网络接口,其中一个接口用于管理。The virtual machine should have at least two network interfaces, with one interface for management. 了解如何创建启用加速网络的 Linux 虚拟机Learn how to create a Linux virtual machine with accelerated networking enabled.

安装 DPDK 依赖项Install DPDK dependencies

Ubuntu 16.04Ubuntu 16.04

sudo add-apt-repository ppa:canonical-server/dpdk-azure -y
sudo apt-get update
sudo apt-get install -y librdmacm-dev librdmacm1 build-essential libnuma-dev libmnl-dev

Ubuntu 18.04Ubuntu 18.04

sudo apt-get update
sudo apt-get install -y librdmacm-dev librdmacm1 build-essential libnuma-dev libmnl-dev

CentOS 7.5CentOS 7.5

yum -y groupinstall "Infiniband Support"
sudo dracut --add-drivers "mlx4_en mlx4_ib mlx5_ib" -f
yum install -y gcc kernel-devel-`uname -r` numactl-devel.x86_64 librdmacm-devel libmnl-devel

SLES 15SLES 15

Azure 内核Azure kernel

zypper  \
  --no-gpg-checks \
  --non-interactive \
  --gpg-auto-import-keys install kernel-azure kernel-devel-azure gcc make libnuma-devel numactl librdmacm1 rdma-core-devel

默认内核Default kernel

zypper \
  --no-gpg-checks \
  --non-interactive \
  --gpg-auto-import-keys install kernel-default-devel gcc make libnuma-devel numactl librdmacm1 rdma-core-devel

设置虚拟机环境(一次性操作)Set up the virtual machine environment (once)

  1. 下载最新的 DPDKDownload the latest DPDK. Azure 需要 18.02 或更高版本。Version 18.02 or higher is required for Azure.
  2. 运行 make config T=x86_64-native-linuxapp-gcc 生成默认配置。Build the default config with make config T=x86_64-native-linuxapp-gcc.
  3. 使用 sed -ri 's,(MLX._PMD=)n,\1y,' build/.config 在生成的配置中启用 Mellanox PMDs。Enable Mellanox PMDs in the generated config with sed -ri 's,(MLX._PMD=)n,\1y,' build/.config.
  4. 使用 make 进行编译。Compile with make.
  5. 使用 make install DESTDIR=<output folder> 进行安装。Install with make install DESTDIR=<output folder>.

配置运行时环境Configure the runtime environment

重启后,运行下面的命令一次:After restarting, run the following commands once:

  1. 巨页Hugepages

    • 针对所有 numa 节点运行以下命令一次,以配置巨页:Configure hugepage by running the following command, once for all numanodes:

      echo 1024 | sudo tee
      /sys/devices/system/node/node*/hugepages/hugepages-2048kB/nr_hugepages
      
    • 使用 mkdir /mnt/huge 创建用于装载的目录。Create a directory for mounting with mkdir /mnt/huge.

    • 使用 mount -t hugetlbfs nodev /mnt/huge 装载巨页。Mount hugepages with mount -t hugetlbfs nodev /mnt/huge.

    • 运行 grep Huge /proc/meminfo 检查巨页是否已保留。Check that hugepages are reserved with grep Huge /proc/meminfo.

      Note

      可以将 grub 文件修改为,在启动时保留巨页,具体是按照适用于 DPDK 的说明操作。There is a way to modify the grub file so that hugepages are reserved on boot by following the instructions for the DPDK. 页面底部提供了这些说明。The instructions are at the bottom of the page. 如果使用的是 Azure Linux 虚拟机,请改为将 /etc/config/grub.d 下的文件修改为跨重启保留巨页。When you're using an Azure Linux virtual machine, modify files under /etc/config/grub.d instead, to reserve hugepages across reboots.

  2. MAC 和 IP 地址:使用 ifconfig -a 查看网络接口的 MAC 和 IP 地址。MAC & IP addresses: Use ifconfig -a to view the MAC and IP address of the network interfaces. VF 网络接口和 NETVSC 网络接口具有相同的 MAC 地址,但只有 NETVSC 网络接口具有 IP 地址。The VF network interface and NETVSC network interface have the same MAC address, but only the NETVSC network interface has an IP address. VF 接口以 NETVSC 接口的从属接口形式运行。VF interfaces are running as subordinate interfaces of NETVSC interfaces.

  3. PCI 地址PCI addresses

    • 运行 ethtool -i <vf interface name> 确定对 VF 使用哪个 PCI 地址。Use ethtool -i <vf interface name> to find out which PCI address to use for VF.
    • 如果 eth0 已启用加速网络,请确保 testpmd 不会意外接管 eth0 的 VF PCI 设备。If eth0 has accelerated networking enabled, make sure that testpmd doesn't accidentally take over the VF pci device for eth0. 如果 DPDK 应用程序意外接管管理网络接口,并导致 SSH 连接断开,请使用串行控制台来停止 DPDK 应用程序。If the DPDK application accidentally takes over the management network interface and causes you to lose your SSH connection, use the serial console to stop the DPDK application. 串行控制台还可用于停止或启动虚拟机。You can also use the serial console to stop or start the virtual machine.
  4. 每次重新启动后,使用 modprobe -a ib_uverbs 加载 ibuverbsLoad ibuverbs on each reboot with modprobe -a ib_uverbs. (仅适用于 SLES 15)另外,使用 modprobe -a mlx4_ib 加载 mlx4_ibFor SLES 15 only, also load mlx4_ib with modprobe -a mlx4_ib.

防故障 PMDFailsafe PMD

DPDK 应用程序必须通过 Azure 中公开的防故障 PMD 运行。DPDK applications must run over the failsafe PMD that is exposed in Azure. 如果应用程序直接通过 VF PMD 运行,它不会收到发往 VM 的所有包,因为一些包通过综合接口显示。If the application runs directly over the VF PMD, it doesn't receive all packets that are destined to the VM, since some packets show up over the synthetic interface.

通过防故障 PMD 运行 DPDK 应用程序,可保证应用程序收到发往 VM 的所有包。If you run a DPDK application over the failsafe PMD, it guarantees that the application receives all packets that are destined to it. 此外,还能确保应用程序继续以 DPDK 模式运行,即使在为主机提供服务时撤销了 VF,也不例外。It also makes sure that the application keeps running in DPDK mode, even if the VF is revoked when the host is being serviced. 若要详细了解防故障 PMD,请参阅防故障轮询模式驱动程序库For more information about failsafe PMD, see Fail-safe poll mode driver library.

运行 testpmdRun testpmd

若要在根模式下运行 testpmd,请在 testpmd 命令前面使用 sudoTo run testpmd in root mode, use sudo before the testpmd command.

基本:健全性检查、防故障适配器初始化Basic: Sanity check, failsafe adapter initialization

  1. 运行以下命令启动单端口 testpmd 应用程序:Run the following commands to start a single port testpmd application:

    testpmd -w <pci address from previous step> \
      --vdev="net_vdev_netvsc0,iface=eth1" \
      -- -i \
      --port-topology=chained
    
  2. 运行以下命令启动双端口 testpmd 应用程序:Run the following commands to start a dual port testpmd application:

    testpmd -w <pci address nic1> \
    -w <pci address nic2> \
    --vdev="net_vdev_netvsc0,iface=eth1" \
    --vdev="net_vdev_netvsc1,iface=eth2" \
    -- -i
    

    若要运行包含超过 2 个 NIC 的 testpmd,--vdev 参数采用以下模式:net_vdev_netvsc<id>,iface=<vf's pairing eth>If you're running testpmd with more than two NICs, the --vdev argument follows this pattern: net_vdev_netvsc<id>,iface=<vf's pairing eth>.

  3. 启动后,运行 show port info all 检查端口信息。After it's started, run show port info all to check port information. 应会看到一个或两个值为 net_failsafe(不是 net_mlx4)的 DPDK 端口。You should see one or two DPDK ports that are net_failsafe (not net_mlx4).

  4. 使用 start <port> /stop <port> 启动流量。Use start <port> /stop <port> to start traffic.

上面的命令在交互模式下启动 testpmd,这是建议用于试用 testpmd 命令的模式。The previous commands start testpmd in interactive mode, which is recommended for trying out testpmd commands.

基本:单个发送端/单个接收端Basic: Single sender/single receiver

以下命令定期列显每秒数据包数的统计信息:The following commands periodically print the packets per second statistics:

  1. 在 TX 端运行以下命令:On the TX side, run the following command:

    testpmd \
      -l <core-list> \
      -n <num of mem channels> \
      -w <pci address of the device you plan to use> \
      --vdev="net_vdev_netvsc<id>,iface=<the iface to attach to>" \
      -- --port-topology=chained \
      --nb-cores <number of cores to use for test pmd> \
      --forward-mode=txonly \
      --eth-peer=<port id>,<receiver peer MAC address> \
      --stats-period <display interval in seconds>
    
  2. 在 RX 端运行以下命令:On the RX side, run the following command:

    testpmd \
      -l <core-list> \
      -n <num of mem channels> \
      -w <pci address of the device you plan to use> \
      --vdev="net_vdev_netvsc<id>,iface=<the iface to attach to>" \
      -- --port-topology=chained \
      --nb-cores <number of cores to use for test pmd> \
      --forward-mode=rxonly \
      --eth-peer=<port id>,<sender peer MAC address> \
      --stats-period <display interval in seconds>
    

若要在虚拟机上运行上面的命令,请先将 app/test-pmd/txonly.c 中的 IP_SRC_ADDR 和 IP_DST_ADDR 更改为与虚拟机的实际 IP 地址一致,再进行编译。When you're running the previous commands on a virtual machine, change IP_SRC_ADDR and IP_DST_ADDR in app/test-pmd/txonly.c to match the actual IP address of the virtual machines before you compile. 否则,数据包在抵达接收端之前将被丢弃。Otherwise, the packets are dropped before reaching the receiver.

高级:单个发送端/单个转发端Advanced: Single sender/single forwarder

以下命令定期列显每秒数据包数的统计信息:The following commands periodically print the packets per second statistics:

  1. 在 TX 端运行以下命令:On the TX side, run the following command:

    testpmd \
      -l <core-list> \
      -n <num of mem channels> \
      -w <pci address of the device you plan to use> \
      --vdev="net_vdev_netvsc<id>,iface=<the iface to attach to>" \
      -- --port-topology=chained \
      --nb-cores <number of cores to use for test pmd> \
      --forward-mode=txonly \
      --eth-peer=<port id>,<receiver peer MAC address> \
      --stats-period <display interval in seconds>
    
  2. 在 FWD 端运行以下命令:On the FWD side, run the following command:

    testpmd \
      -l <core-list> \
      -n <num of mem channels> \
      -w <pci address NIC1> \
      -w <pci address NIC2> \
      --vdev="net_vdev_netvsc<id>,iface=<the iface to attach to>" \
      --vdev="net_vdev_netvsc<2nd id>,iface=<2nd iface to attach to>" (you need as many --vdev arguments as the number of devices used by testpmd, in this case) \
      -- --nb-cores <number of cores to use for test pmd> \
      --forward-mode=io \
      --eth-peer=<recv port id>,<sender peer MAC address> \
      --stats-period <display interval in seconds>
    

若要在虚拟机上运行上面的命令,请先将 app/test-pmd/txonly.c 中的 IP_SRC_ADDR 和 IP_DST_ADDR 更改为与虚拟机的实际 IP 地址一致,再进行编译。When you're running the previous commands on a virtual machine, change IP_SRC_ADDR and IP_DST_ADDR in app/test-pmd/txonly.c to match the actual IP address of the virtual machines before you compile. 否则,数据包在抵达转发端之前将被丢弃。Otherwise, the packets are dropped before reaching the forwarder. 无法使用第三台计算机来接收转发的流量,因为除非做出一些代码更改,否则 testpmd 转发端不会修改第 3 层地址。You won't be able to have a third machine receive forwarded traffic, because the testpmd forwarder doesn't modify the layer-3 addresses, unless you make some code changes.

参考References