2021 (modified: 31 Mar 2022)ICML 2021Readers: Everyone
Abstract:Scalable training of large models (like BERT and GPT-3) requires careful optimization rooted in model design, architecture, and system capabilities. From a system standpoint, communication has beco...