Outlier-Free SpeechLM for Fast Adaptation and Robust Quantization

TMLR Paper6233 Authors

17 Oct 2025 (modified: 22 Oct 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: We introduce SOFA (Stabilized Outlier-Free Attention), a drop-in replacement for the softmax activation that tackles the attention-outlier problem when turning a text-only LLM into a speech-text multi-modal model (SpeechLM). Our primary observation is that outliers emerge from both multi-modal low-rank adaptation and post-training quantization of transformer attention, degrading state-of-the-art SpeechLMs performance. To address these issues, we leverage a pretrained language model as a foundation and replace the standard softmax attention with SOFA which can be applied as a drop-in replacement of the vanilla softmax. We propose a plug-in method that directly eliminates outliers without adjusting pretraining weights and quantitatively measure the prevalence and impact of outliers in a unified speech-text transformer. We evaluate two multi‑modal adaptation strategies: full fine‑tuning on multi‑modal data followed by post‑training quantization, and apply LoRA on SOFA equipped model (SOFA-LoRA adapter) which keeps the pretrained LLM frozen without extra pre‑training. The full fine-tuning route delivers strong, consistent gains across all modalities (textLM, SpeechLM, ASR, TTS), whereas the SOFA-LoRA adapter without touching any pretrained weights—surpasses the vanilla-LoRA adapter baseline and is particularly effective on text-output tasks such as ASR, all while retaining full compatibility with standard LLM checkpoints. Empirically, on the OPT-1.3b model, incorporating SOFA into SpeechLM yields a 88% improvement in multi-modal low-rank adaptation and a 37% improvement in post-training quantization.
Submission Type: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=Bz25NI5nVO&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DTMLR%2FAuthors%23your-submissions)
Changes Since Last Submission: > Commit: The submission doesn't follow TMLR's stylefile format (notably the font isn't the right one). Please fix the format and compare with the existing submission before resubmitting. We use the right format and font size.
Assigned Action Editor: ~Tatiana_Likhomanenko1
Submission Number: 6233
Loading