Horovod Performance on CPUs with MVAPICH2-X

Machine Specifications

CPU Model CPU Core Info Memory IB Card OS OFED
Intel Xeon Platinum 8280 2x28 @ 2.7Ghz 192GB Mellanox HDR (200 Gbps) CentOS 7.6 OFED 4.6-1.0.1

TensorFlow Performance on Frontera

Model: ResNet-50 Batch Size: 64 Benchmark: tf_cnn_benchmark DL
DL Framework: Intel-TF v1.14 ppn: 4
TensorFlow on Frontera

PyTorch Performance on Frontera

Model: ResNet-50 Batch Size: 64 Benchmark: tf_cnn_benchmark
DL Framework: Keras ppn: 4
PyTorch on Frontera

MXNet Performance on Frontera

Model: ResNet-50 Batch Size: 64 Benchmark: tf_vnn_benchmark
DL Framework: Intel-MXNet v1.5.0 ppn: 4
MXNet on Frontera

Keras Performance on Frontera

Model: ResNet-50 Batch Size: 16 Benchmark: tf_cnn_benchmark
DL Framework: MKL-PyTorch v1.2.0 ppn: 14
Keras on Frontera

TensorFlow with Horovod and MVAPICH2-X provides excellent scaling performance for many different Deep Neural Network architectures, including ResNet-101, ResNet-152, Inception-v3, and Inception-v4.

TensorFlow Performance with ResNet-101

Model: ResNet-101 Batch Size: 64 Benchmark: tf_cnn_benchmark
DL Framework: Intel-TF v1.14 ppn: 4
TensorFlow and resnet101 on Frontera

TensorFlow Performance with ResNet-152

Model: ResNet-152 Batch Size: 64 Benchmark: tf_cnn_benchmark
DL Framework: Intel-TF v1.14 ppn: 4
TensorFlow and resnet152 on Frontera

TensorFlow Performance with Inception-v3

Model: Inception-v3 Batch Size: 64 Benchmark: tf_cnn_benchmark
DL Framework: Intel-TF v1.14 ppn: 4
TensorFlow and inception3 on Frontera

TensorFlow Performance with Inception-v4

Model: Inception-v4 Batch Size: 64 Benchmark: tf_cnn_benchmark
DL Framework: Intel-TF v1.14 ppn: 4
TensorFlow and inception4 on Frontera

Machine Specifications

CPU Model CPU Core Info Memory IB Card OS
Intel Xeon Platinum 8160 2x24 @ 2.1 GHz 192 GB Intel Omni-Path (100 Gbps) Redhat 7

TensorFlow Performance on Stampede2

Model: ResNet-50 Batch Size: 128 Benchmark: tf_cnn_benchmark
DL Framework: Intel-TF v1.12
TensorFlow on Stampede2

PyTorch Performance on Stampede2

Model: ResNet-50 Batch Size: 16
Benchmark: pytorch_synthetic_benchmark
DL Framework: PyTorch v1.1
TensorFlow on Stampede2

Machine Specifications

CPU Model CPU Core Info Memory
AMD EPYC 7551 2x32 @ 2.0 GHz 256 GB

TensorFlow Performance on AMD EPYC

Model: ResNet-50 Batch Size: 32 Benchmark: tf_cnn_benchmark
DL Framework: Intel-TF v1.12 ppn: 32
TensorFlow on AMD EPYC

PyTorch Performance on AMD EPYC

Model: ResNet-50 Batch Size: 32
Benchmark: pytorch_synthetic_benchmark
DL Framework: PyTorch v1.1 ppn: 32
PyTorch on AMD EPYC