Keywords: data echoing, data loading bottleneck
Abstract: Over the past decade, breakthroughs in both general-purpose and specialized hardware have propelled the success of large-scale machine learning. However, the advancements in general-purpose hardware are not keeping pace with those in specialized hardware. Consequently, operations conducted on the general-purpose hardware have become the primary performance bottleneck. Notably, data loading significantly lags behind the gradient computation during training. To address this issue, the technique of data echoing has been introduced, whereby the current batch of samples is reused for gradient computation to minimize idle time while waiting for new data. However, this approach can lead to overfitting on the current batch, and it remains unclear whether convergence benefits from this practice. In this paper, we provide a sharper analysis on a stochastic variant of data echoing and show that it obtains linear speedup proportional to the number of reuse times. Additionally, we investigate the impact of the communication bottleneck in data parallelism of data echoing, and propose a new communication-efficient data echoing algorithm via reducing the frequency of model averaging. We then show that it is possible to perform data echoing without additional communication cost with data parallelism. Finally, we perform empirical experiments to verify our analysis on the data echoing and the proposed efficient algorithm for data parallelism.
Primary Area: other topics in machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 12780
Loading