Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Zhuohan Li, Eric Wallace, Sheng Shen, Kevin Lin, Kurt Keutzer, Dan Klein, Joey Gonzalez

2020 (modified: 04 Nov 2022)ICML 2020Readers: Everyone

Abstract: Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference. We study...

0 Replies