1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence SpeedDownload PDFOpen Website

2021 (modified: 31 Mar 2022)ICML 2021Readers: Everyone
Abstract: Scalable training of large models (like BERT and GPT-3) requires careful optimization rooted in model design, architecture, and system capabilities. From a system standpoint, communication has beco...
0 Replies

Loading