Welcome to the High-Performance Deep Learning project created by the Network-Based Computing Laboratory of The Ohio State University. The availability of large data sets (e.g. ImageNet, PASCAL VOC 2012) coupled with massively parallel processors in modern HPC systems (e.g. NVIDIA GPUs) have fueled a renewed interest in Deep Learning (DL) algorithms. In addition to the popularity of massively parallel DL accelerators like GPUs, the availability and memory-abundance of modern CPUs poses a viable alternative for DL training. This resurgence of DL applications has triggered the development of DL frameworks like Caffe, PyTorch, TensorFlow, Apache MXNet, and CNTK. While most DL frameworks provide experimental support for multi-node training, their distributed implementation is often suboptimal. The objective of the HiDL project is to exploit modern HPC technologies and solutions to scale out and accelerate DL frameworks.

The HiDL packages are being used by more than 85 organizations worldwide in 21 countries (Current Users) to accelerate Deep Learning and Machine Learning applications. As of Sep '23, more than 2,650 downloads have taken place from this project's site. The HiDL project contains the following packages.

MPI4DL v0.5

MPI4DL v0.5 is a distributed and accelerated training framework for very high-resolution images that integrates Spatial Parallelism, Layer Parallelism, and Pipeline Parallelism.

  • Based on PyTorch
  • (NEW) Support for training very high-resolution images
    • Distributed training support for:
      • (NEW) Layer Parallelism (LP)
      • (NEW) Pipeline Parallelism (PP)
      • (NEW) Spatial Parallelism (SP)
      • (NEW) Spatial and Layer Parallelism (SP+LP)
      • (NEW) Spatial and Pipeline Parallelism (SP+PP)
    • (NEW) Support for AmoebaNet and ResNet models
    • (NEW) Support for different image sizes and custom datasets
  • (NEW) Exploits collective features of MVAPICH2-GDR
  • Compatible with
    • (NEW) NVIDIA GPU A100 and V100
    • (NEW) CUDA [11.6, 11.7]
    • (NEW) Python >= 3.8
    • (NEW) PyTorch [1.12.1 , 1.13.1]
    • (NEW) MVAPICH2-GDR = 2.3.7

MPI-Driven DL Training (TensorFlow, Pytorch, MXNet) with Horovod and MVAPICH2

The HiDL software suite version 1.0 is a high-performance deep learning stack based on MVAPICH2 high-performance CUDA-aware communication backend. HiDL uses horovod over the MVAPICH2 and MVAPICH2-GDR backend to support large-scale distributed deep learning workload and targets modern HPC clusters built with CPUs, dense GPUs and high-performance interconnects.

The 1.0 release of the HiDL stack is introducing the following features:

  • Based on Horovod
  • Full support for Tensorflow, PyTorch, Keras and Apache MXNet
  • Optimized support for MPI controller in deep learning workloads
    • Efficient large-message collectives (e.g. Allreduce) on various CPUs and GPUs
    • GPU-Direct Algorithms for all collective operations (including those commonly used for data and model-parallelism, e.g. Allgather and Alltoall)
    • Support for fork safety
    • Exploits efficient large message collectives in MVAPICH2 and MVAPICH2-GDR
  • Exploits efficient large message collectives in MVAPICH2 and MVAPICH2-GDR
  • Compatible with
    • Mellanox InfiniBand adapters (e.g., EDR, FDR, HDR)
    • NVIDIA GPU K80, P100, V100, Quadro RTX 5000, A100
    • CUDA [9.x, 10.x, 11.x] and CUDNN [7.5.x, 7.6.x, 8.0.x, 8.2.x, 8.4.x]
    • (NEW) AMD MI100 GPUs
    • (NEW) ROCm [5.1.x]
    • Tensorflow [1.x, 2.x], Pytorch 1.x, Apache MXNet 1.x
    • (NEW) Horovod [0.24.0, 0.25.0, 0.26.0, 0.27.0]
    • (NEW) Python [3.x]

Horovod Performance on MVAPICH2-X and MVAPICH2-GDR

For instructions on building Horovod with MVAPICH2-X or MVAPICH2-GDR, please refer to the Horovod Userguide

MPI-Driven ML Training with MPI4cuML

cuML is a distributed machine learning training framework with a focus on GPU acceleration and distributed computing. MVAPICH2-GDR provides many features to augment distributed training with cuML on GPUs.

  • Based on cuML 22.02.00
    • Include ready-to-use examples for KMeans, Linear Regression, Nearest Neighbors, and tSVD
  • MVAPICH2 support for RAFT 22.02.00
    • Enabled cuML’s communication engine, RAFT, to use MVAPICH2-GDR backend for Python and C++ cuML applications
    • KMeans, PCA, tSVD, RF, LinearModels
    • Added switch between available communication backends (MVAPICH2 and NCCL)
  • Built on top of mpi4py over the MVAPICH2-GDR library
  • Tested with
    • Mellanox InfiniBand adapters (FDR and HDR)
    • NVIDIA GPU A100, V100 and, P100
    • Various x86-based multi-core platforms (AMD and Intel)

cuML Performance on MVAPICH2-GDR

For instructions on building cuML with MVAPICH2-GDR, please refer to the Userguide for MPI4cuML 0.5


(NEW) The 11th Annual MVAPICH User Group (MUG) Conference was held successfully in a hybrid manner on August 21-23, 2023 with more than 225 attendees. Slides and videos of the Presentations are available here.

HiDL 1.0 (based on Horovod) with support for TensorFlow, PyTorch, Keras and MXNet, built on top of MVAPICH2-GDR and MVAPICH2-X, providing large-scale distributed deep learning support for clusters with NVIDIA and AMD GPUs is available. [more]

MPI4cuML 0.5 (based on cuML 22.02.00) with support for RAFT 22.02.00, C++ and Python APIs, built on top of mpi4py over the MVAPICH2-GDR library, handles to use MVAPICH2-GDR backend for Python cuML applications (KMeans, PCA, tSVD, RF, and LinearModels) is available. [more]

Partnership and contribution to the NSF-Awarded $20M AI-Institute on Intelligent CyberInfrastructure (ICICLE). Details.

HiDL in the News