MVAPICH-Plus Changelog ----------------------- This file briefly describes the changes to the MVAPICH-Plus software package. The logs are arranged in the "most recent first" order. MVAPICH-Plus 4.1 (4.1rc released 06/09/2025) * Features and Enhancements (since 4.0) - Added support for adaptive dynamic collective tuning - Performed on the fly at runtime for both CPU and GPU - Enhanced GPU tuning support - Optimized HIP kernel-based collective performance for AMD GPUs - Optimized algorithms for GPU Allreduce - GPU aware RSA, RD, Direct, and Ring algorithms - Optimized direct-throttling algorithms for GPU Allgather, Allgatherv, and Reduce-Scatter - Optimized ring algorithm for GPU Reduce-Scatter - Added support for dynamic GPU initialization after MPI_Init - Added support for unified GPU memory models for AMD MI300A APUs - Improved rndv protocol performance in point-to-point operations - Improved MPIT PVAR support * Bug Fixes (since 4.0) - Fixed pointer caching for ROCM 6 - Fixed memory leaks - Fixed issues with GPU binding in UCX enabled builds MVAPICH-Plus 4.0 (4.0a released 07/26/2024) (4.0b released 08/16/2024) (4.0rc released 11/08/2024) (4.0 GA released 12/20/2024) * Overall Features and Enhancements - Based on MPICH 4.3.0a1 - Supports all features of MPI 4.1 standard - Includes enhanced OFI provider for IB systems, "mverbs;ofi_ucr" - Support for * Major CPUs (x86-Intel, x86-AMD, and ARM) * Major Interconnects (IB, Slingshot, OPX, Omni-Path, ROCE, and Ethernet/ iWARP) * Major GPUs (from NVIDIA, AMD, and Intel) - Optimized support for pt2pt inter-node and intra-node communication - CMA support for intra-node pt2pt operations - Optimized algorithms for collectives - CUDA-aware MPI (pt2pt and collective) support - Support for NVIDIA GDRCOPY and AMD LARGEBAR GPU copy operations - Optimized IPC-based support for collectives on Intel GPU - Allreduce and reduce - Support kernel-based Allreduce on NVIDIA/AMD/Intel GPUs - On-the fly compression support for collectives using GPU buffers on NVIDIA GPUs - Multi-stream ZFP-based compression - Allgather, Alltoall, Allreduce, and Reduce_Scatter - On-the fly compression support for collectives using GPU buffers on AMD GPUs - ZFP-based compression - Allgather, Alltoall - On-the fly compression support for point-to-point operations using GPU buffers on NVIDIA GPUs - On-the fly compression support for point-to-point operations using GPU buffers on AMD GPUs