Efficient Large-Scale Autoregressive Sequence ModelsDownload PDF

Anonymous

17 Apr 2022 (modified: 05 May 2023)ACL ARR 2022 April Blind SubmissionReaders: Everyone
Abstract: Large deep-learning based autoregressive models have shown state-of-the-art performance in many sequence-to-sequence tasks including neural machine translation. Deep Ensembles of these systems yield performance gains over individual models and enable uncertainty estimates, including knowledge uncertainty, on the predictions to be derived. The challenge with these ensembles is that training costs, memory requirements and inference costs all scale linearly with the number of members of the ensemble. In this work we explore how to train autoregressive models efficiently, while in a single forward pass, maintaining the ability to make robust uncertainty estimates. The approach combines efficient ensemble generation and distribution distillation techniques. This combination dramatically reduces the computational and memory costs compared to Deep Ensembles. Experiments on WMT16 and WMT20 show that single models trained using the proposed scheme can reach or outperform Deep Ensembles while being much cheaper at training and inference time. Additionally, by extending existing distribution distillation techniques, a single model can be trained to consistently outperform a Deep Ensemble on out-of-distribution detection.
Paper Type: long
0 Replies

Loading