Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Applies to: ✔️ Linux VMs ✔️ Windows VMs ✔️ Flexible scale sets ✔️ Uniform scale sets
Performance expectations using common HPC microbenchmarks are as follows:
Workload | HBv3 |
---|---|
STREAM Triad | 330-350 GB/s (amplified up to 630 GB/s) |
High-Performance Linpack (HPL) | 4 TF (Rpeak, FP64), 8 TF (Rpeak, FP32) for 120-core VM size |
RDMA latency & bandwidth | 1.2 microseconds (1 byte), 192 GB/s (one-way) |
FIO on local NVMe SSDs (RAID0) | 7 GB/s reads, 3 GB/s writes; 186k IOPS reads, 201k IOPS writes |
Process pinning works well on HBv3-series VMs because we expose the underlying silicon as-is to the guest VM. We strongly recommend process pinning for optimal performance and consistency.
The MPI latency test from the OSU microbenchmark suite can be executed as shown. Sample scripts are on GitHub.
./bin/mpirun_rsh -np 2 -hostfile ~/hostfile MV2_CPU_MAPPING=[INSERT CORE #] ./osu_latency
The MPI bandwidth test from the OSU microbenchmark suite can be executed per below. Sample scripts are on GitHub.
./mvapich2-2.3.install/bin/mpirun_rsh -np 2 -hostfile ~/hostfile MV2_CPU_MAPPING=[INSERT CORE #] ./mvapich2-2.3/osu_benchmarks/mpi/pt2pt/osu_bw
The Mellanox Perftest package has many InfiniBand tests such as latency (ib_send_lat) and bandwidth (ib_send_bw). An example command is below.
numactl --physcpubind=[INSERT CORE #] ib_send_lat -a
- Learn about scaling MPI applications.
- Review the performance and scalability results of HPC applications on the HBv3 VMs at the TechCommunity article.
- Read about the latest announcements, HPC workload examples, and performance results at the Azure Compute Tech Community Blogs.