Understanding Communication Characteristics of Distributed Training

Published: 01 Jan 2024, Last Modified: 07 Aug 2024APNet 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Communication is pivotal in distributed training and a thorough understanding of its characteristics is essential for future optimizations. However, prior works are limited, either focusing on customized optimizations or conducting incomplete explorations on communication characteristics. In this work, we systematically analyze the communication characteristics of distributed training, considering two key aspects of communication: pattern and overhead, and assessing a broad spectrum of determinant factors. In particular, we extensively investigate the features of communication patterns, such as predictability, and comprehensively evaluate the impact of various factors on communication overhead. Additionally, we develop and validate an analytical formulation to estimate communication overhead, providing a mathematical understanding of models with predictability.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview