OSU-Caffe 0.9 FeaturesOSU-Caffe derives from Caffe, which is a Deep Learning Framework that provides the flexibility to design and enhance DL models. All the features available with the NVIDIA's fork of the BVLC Caffe are available with this release. OSU-Caffe offers additional features and mechanisms that take advantage of the HPC resources. It is an MPI distributed version that scales-out on multi-GPU nodes. It takes advanatge of the optimized CUDA-Aware MPI to boost its performance on GPU Clusters. OSU-Caffe re-designs the DL workflow to provide overlap of the computation and communication. Further, it takes advantage of efficient large message MPI collective communication operations from GPU buffers that efficiently exploit GPUDirect RDMA, CUDA IPC, CUDA Kernels and Core-Direct features.
The list of features for supporting distributed and large scale DL frameworks.
- Based on Nvidia's Caffe fork (caffe-0.14)
- MPI-based distributed training support
- Efficient scale-out support for multi-GPU nodes systems
- New workflow to overlap the compute layers and the communication
- Efficient parallel file readers to optimize I/O and data movement
- Takes advantage of Lustre Parallel File System
- Exploits efficient large message collectives in MVAPICH2-GDR 2.2
- Tested with
- Various CUDA-aware MPI libraries
- CUDA 7.5
- Various HPC Clusters with K80 GPUs, varying number of GPUs/node, and InfiniBand (FDR and EDR) adapters