RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models

22 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: automatic speech recognition, quantization, seq2seq, transducer
Abstract: With the rapid increase in the size of neural networks, model compression has become an important area of research. Quantization is an effective technique at decreasing the model size, memory access, and compute load of large models. Despite recent advances in quantization aware training (QAT) technique, most papers present evaluations that are focused on computer vision tasks, which have different layer composition and training dynamics compared to sequence tasks. In this paper, we first benchmark the impact of popular techniques such as straight through estimator, pseudo-quantization noise (PQN), learnable scale parameter, clipping, etc. on 4-bit seq2seq models across a suite of speech recognition datasets ranging from 1,000 hours to 1 million hours, as well as one machine translation dataset to illustrate its applicability outside of speech. Through the experiments, we report that accuracy suffers when there is insufficient regularization signal flowing back to the outliers. We propose to construct the quantization scale as different functions of the outliers in order to regularize them as part of the end-to-end learning problem (outperforming popular learnable scale and clipping methods). PQN-QAT shows a larger improvement under the proposed method, and it opens up the possibility to exploit some of its other benefits: 1) training a single model that performs well in mixed precision mode and 2) improved generalization on long form speech recognition.
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6244
Loading