Horovod Performance on CPUs with MVAPICH2-X
Machine Specifications
CPU Model | CPU Core Info | Memory | IB Card | OS | OFED |
---|---|---|---|---|---|
Intel Xeon Platinum 8280 | 2x28 @ 2.7Ghz | 192GB | Mellanox HDR (200 Gbps) | CentOS 7.6 | OFED 4.6-1.0.1 |
TensorFlow Performance on Frontera
Model: ResNet-50 Batch Size: 64 Benchmark: tf_cnn_benchmark DL DL Framework: Intel-TF v1.14 ppn: 4

PyTorch Performance on Frontera
Model: ResNet-50 Batch Size: 64 Benchmark: tf_cnn_benchmark DL Framework: Keras ppn: 4

MXNet Performance on Frontera
Model: ResNet-50 Batch Size: 64 Benchmark: tf_vnn_benchmark DL Framework: Intel-MXNet v1.5.0 ppn: 4

Keras Performance on Frontera
Model: ResNet-50 Batch Size: 16 Benchmark: tf_cnn_benchmark DL Framework: MKL-PyTorch v1.2.0 ppn: 14

TensorFlow with Horovod and MVAPICH2-X provides excellent scaling performance for many different Deep Neural Network architectures, including ResNet-101, ResNet-152, Inception-v3, and Inception-v4.
TensorFlow Performance with ResNet-101
Model: ResNet-101 Batch Size: 64 Benchmark: tf_cnn_benchmark DL Framework: Intel-TF v1.14 ppn: 4

TensorFlow Performance with ResNet-152
Model: ResNet-152 Batch Size: 64 Benchmark: tf_cnn_benchmark DL Framework: Intel-TF v1.14 ppn: 4

TensorFlow Performance with Inception-v3
Model: Inception-v3 Batch Size: 64 Benchmark: tf_cnn_benchmark DL Framework: Intel-TF v1.14 ppn: 4

TensorFlow Performance with Inception-v4
Model: Inception-v4 Batch Size: 64 Benchmark: tf_cnn_benchmark DL Framework: Intel-TF v1.14 ppn: 4

Machine Specifications
CPU Model | CPU Core Info | Memory | IB Card | OS |
---|---|---|---|---|
Intel Xeon Platinum 8160 | 2x24 @ 2.1 GHz | 192 GB | Intel Omni-Path (100 Gbps) | Redhat 7 |
TensorFlow Performance on Stampede2
Model: ResNet-50 Batch Size: 128 Benchmark: tf_cnn_benchmark DL Framework: Intel-TF v1.12

PyTorch Performance on Stampede2
Model: ResNet-50 Batch Size: 16 Benchmark: pytorch_synthetic_benchmark DL Framework: PyTorch v1.1

Machine Specifications
CPU Model | CPU Core Info | Memory |
---|---|---|
AMD EPYC 7551 | 2x32 @ 2.0 GHz | 256 GB |
TensorFlow Performance on AMD EPYC
Model: ResNet-50 Batch Size: 32 Benchmark: tf_cnn_benchmark DL Framework: Intel-TF v1.12 ppn: 32

PyTorch Performance on AMD EPYC
Model: ResNet-50 Batch Size: 32 Benchmark: pytorch_synthetic_benchmark DL Framework: PyTorch v1.1 ppn: 32
