The HiDL team members participated in multiple events during SC'17!
Please refer here for the presentation slides!


HiDL@SC17


Overview

Welcome to the High-Performance Deep Learning project created by the Network-Based Computing Laboratory of The Ohio State University. Availability of large data sets like ImageNet and massively parallel computation support in modern HPC devices like NVIDIA GPUs have fueled a renewed interest in Deep Learning (DL) algorithms. This has triggered the development of DL frameworks like Caffe, Torch, TensorFlow, and CNTK. However, most DL frameworks have been limited to a single node. The objective of the HiDL project is to exploit modern HPC technologies and solutions to scale out and accelerate DL frameworks.


OSU-Caffe

OSU-Caffe library is a scalable and distributed Caffe adaptation for modern multi-GPU clusters. This is designed using a co-design approach of the Caffe framework and the widely used MVAPICH2-GDR, MPI runtime. The co-design methodology involves re-designing Caffe’s workflow to maximize the overlap of computation and communication. It brings DL-Awareness to the MPI runtime by designing efficient CUDA-Aware collective operations for very large messages. Major features for OSU-Caffe 0.9 are given below.

  • Based on Nvidia's Caffe fork (caffe-0.14)
  • MPI-based distributed training support
  • Efficient scale-out support for multi-GPU nodes systems
  • New workflow to overlap the compute layers and the communication
  • Efficient parallel file readers to optimize I/O and data movement
    • Takes advantage of Lustre Parallel File System
  • Exploits efficient large message collectives in MVAPICH2-GDR 2.2
  • Tested with
    • Various CUDA-aware MPI libraries
    • CUDA 7.5
    • Various HPC Clusters with K80 GPUs, varying number of GPUs/node, and InfiniBand (FDR and EDR) adapters

Announcements


Upcoming Tutorial: High Performance Distributed Deep Learning for Dummies at Hot Interconnect 2017.

OSU-Caffe 0.9 (based on Nvidia's Caffe fork, caffe-0.14) with support for MPI-based distributed training, efficient scale-out on multi-GPU nodes, new workflow to overlap the compute layers and communication, optimizing I/O and data movement with parallel file readers, taking advantage of Luster, and exploiting large message collectives in MVAPICH2-GDR 2.2 library is available. [more]

HiDL in the News