MPI4DL Performance

Machine Specifications

CPU Model CPU Core Info Memory IB Card OS GPU
AMD EPYC 7713 2x64 @ 2.3Ghz 263 GB Mellonox HDR (200 Gbps) Rocky Linux 8.5 NVIDIA A100-PCIE-40GB(2/Node)

AmeobaNet f214

Image Size

1024 * 1024

2048 * 2048

Batch-size

1

2

1

2

Layer Parallelism (Images/Sec)

2.68

3.37

0.97

1.22

Spatial Parallelism (Images/Sec)

2.78

5.02

2.21

2.96

speed up graph
speed up graph

Above performance evaluation compares the throughput of Pipeline Parallelism and Pipeline + Spatial Parallelism techniques. The evaluation was conducted using a dataset provided by PyTorch and was performed on the OSU MRI cluster.

Performance comparison of Spatial and Bidirectional Parallelism for Ameobanet f214

Batch-size

2

4

Spatial Parallelism (Images/ Sec)

0.88

0.97

Spatial + Bidirectional Parallelism (Images/ Sec)

1.38

2.45

Performance comparison of Spatial and Bidirectional Parallelism for ResNet

Batch-size

2

4

Spatial Parallelism (Images/ Sec)

0.65

0.64

Spatial + Bidirectional Parallelism (Images/ Sec)

0.82

1.29

Amoeba1024 speedup
Res Net speedup

Above figures compare the performance of Spatial Parallelism and Spatial + Bidirectional Parallelism techniques with the following configurations: 5 model splits,4 spatial parts, and 2 model replicas for Bidirectional Parallelism. The evaluation was conducted using a dataset provided by PyTorch and was performed on the OSU MRI cluster.