High-Performance Deep Learning :: MPI4DL Performance

MPI4DL Performance

Machine Specifications

CPU Model	CPU Core Info	Memory	IB Card	OS	GPU
AMD EPYC 7713	2x64 @ 2.3Ghz	263 GB	Mellonox HDR (200 Gbps)	Rocky Linux 8.5	NVIDIA A100-PCIE-40GB(2/Node)

AmeobaNet f214

Image Size	*1024 1024**		*2048 2048**
Batch-size	1	2	1	2
Layer Parallelism (Images/Sec)	2.68	3.37	0.97	1.22
Spatial Parallelism (Images/Sec)	2.78	5.02	2.21	2.96

Above performance evaluation compares the throughput of Pipeline Parallelism and Pipeline + Spatial Parallelism techniques. The evaluation was conducted using a dataset provided by PyTorch and was performed on the OSU MRI cluster.

Performance comparison of Spatial and Bidirectional Parallelism for Ameobanet f214

Batch-size	2	4
Spatial Parallelism (Images/ Sec)	0.88	0.97
Spatial + Bidirectional Parallelism (Images/ Sec)	1.38	2.45

Performance comparison of Spatial and Bidirectional Parallelism for ResNet

Batch-size	2	4
Spatial Parallelism (Images/ Sec)	0.65	0.64
Spatial + Bidirectional Parallelism (Images/ Sec)	0.82	1.29

Above figures compare the performance of Spatial Parallelism and Spatial + Bidirectional Parallelism techniques with the following configurations: 5 model splits,4 spatial parts, and 2 model replicas for Bidirectional Parallelism. The evaluation was conducted using a dataset provided by PyTorch and was performed on the OSU MRI cluster.

High-Performance Deep Learning (HiDL)

MPI4DL Performance

Machine Specifications

AmeobaNet f214

Performance comparison of Spatial and Bidirectional Parallelism for Ameobanet f214

Performance comparison of Spatial and Bidirectional Parallelism for ResNet