Horovod Performance on GPUs with MVAPICH2-GDR

Machine Specifications

CPU Model CPU Core Info Memory IB Card OS OFED GPU CUDA
IBM POWER 9 2x22 @ 2.3Ghz 256 GB Mellanox EDR (100 Gbps) RHEL 7.6 MOFED 4.7-3.2.9.1 NVIDIA V100 (6/Node) CUDA 9.2
Model Batch Size Benchmark DL Framework
ResNet-50 64 tf_cnn_benchmarks TF v1.14

Machine Specifications

CPU Model CPU Core Info Memory IB Card OS OFED GPU CUDA
IBM POWER 9 2x20 @ 2.3Ghz 256 GB Mellanox EDR (100 Gbps) Redhat 7.6 MOFED 4.5-2.2.9.0 NVIDIA V100 (4/Node) CUDA 10.2
Model Batch Size Benchmark DL Framework
ResNet-50 64 tensorflow2_synthetic_benchmark TF v2.1

Machine Specifications

CPU Model CPU Core Info Memory IB Card OS OFED GPU CUDA
IBM POWER 9 2x22 @ 2.3Ghz 256 GB Mellanox EDR (100 Gbps) RHEL 7.6 MOFED 4.5-2.2.9.0 NVIDIA V100 (4/Node) CUDA 10.1
Model Batch Size Benchmark DL Framework
ResNet-50 64 pytorch_synthetic_benchmarks PyTorch v1.5.0

Machine Specifications

CPU Model CPU Core Info OS GPU CUDA
Intel Xeon Platinum 8168 2x24 @ 2.7Ghz Ubuntu NVIDIA V100 (16/Node) CUDA 9.2
Model Batch Size Benchmark DL Framework
ResNet-50 64 tf_cnn_benchmarks TF v1.14