PerfTop: Towards performance prediction of distributed learning over general topology

Published: 01 Jan 2024, Last Modified: 18 Jan 2025J. Parallel Distributed Comput. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•A new performance prediction framework (termed PerfTop) is proposed to accurately predict the execution time of distributed learning over general topologies.•The framework provides an in-depth analysis of the underlying mechanisms of communication and considers the overlap between computation and communication.•Extensive experiments show that PerTop achieves an accuracy of above 85% in predicting the iteration time of distributed training over general topologies.
Loading