IBing: An Efficient Interleaved Bidirectional Ring All-Reduce Algorithm for Gradient Synchronization

Published: 2025, Last Modified: 07 Jan 2026ACM Trans. Archit. Code Optim. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Ring all-reduce is currently the most commonly used collective communication technique in the fields of data parallel and distributed computing. It consists of three phases: communication establishment, data transmission, and data processing at each step. However, this method may suffer from increased communication latency as the number of computation nodes increases, excessive communication steps and data processing procedures can lead to insufficient bandwidth utilization.To address this issue, this article proposes an Interleaved Bidirectional Ring (IBing) all-reduce method, which uses specially crafted communication operations to improve communication efficiency by reducing the effects of both communication establishment and data processing time. IBing reduces the number of communication steps by half compared to the Ring all-reduce. The results of extensive experiments indicate that the proposed IBing design can reduce total communication consumption by an average of 8.49% and up to 49.73%.
Loading