PyTorch Performance on GPUs with MVAPICH-Plus

Machine Specifications: OLCF Frontier

CPU Model CPU Core Info Memory GPU Model GPU Memory IB Card
AMD EPYC 7A53 CPU 1x64@2GHz 512 GB DDR4 AMD MI250X (4/Node) 128 GB HBM 2e HPE Slingshot (200 Gb/s)
Model Batch Size Block Size Benchmark Dataset DL Framework
GPT-2 12 1024 NanoGPT OpenWebText PyTorch 2.6.0

We welcome users to try out our pre-built pytorch 2.7.1 wheel on Frontier:

https://hidl.cse.ohio-state.edu/download/hidl/wheels/torch-2.7.1a0+gitc812406-cp312-cp312-linux_x86_64.whl
GPT-2 results on OLCF Frontier (1)
GPT-2 results on OLCF Frontier (2)

Machine Specifications: TACC Vista

CPU Model CPU Core Info Memory GPU Model GPU Memory IB Card
NVIDIA Grace CPU 1x72@3.1 GHz 116 GB DDR5 NVIDIA H200 GPU (1/Node) 96 GB HBM 3 Mellanox NDR (400 Gb/s)
Model Batch Size Block Size Benchmark Dataset DL Framework
GPT-2 12 1024 NanoGPT OpenWebText PyTorch 2.6.0
NanoGPT results on TACC Vista