Context-Driven Dynamic Pruning for Large Speech Foundation Models

Published: 26 Aug 2025, Last Modified: 26 Aug 2025SpeechAI TTIC 2025 OralorPosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Pruning, Dynamic Pruning, Speech Foundation Model, Speech to Text, Speech Recognition
TL;DR: We apply context-driven dynamic pruning to a speech foundation model using acoustic and speaker embeddings, reducing 56.7 GFLOPs and improving BLEU by 26.1% over full finetuning.
Presentation Preference: Open to it if recommended by organizers
Abstract: Speech foundation models achieve strong generalization across languages and acoustic conditions, but require significant computational resources for inference. In the context of speech foundation models, pruning techniques have been studied that dynamically optimize model structures based on the target audio leveraging external context. In this work, we extend this line of research and propose context-driven dynamic pruning, a technique that optimizes the model computation depending on the context between different input frames and additional context during inference. We employ the Open Whisper-style Speech Model (OWSM) and incorporate speaker embeddings, acoustic event embeddings, and language information as additional context. By incorporating the speaker embedding, our method achieves a reduction of 56.7 GFLOPs while improving BLEU scores by a relative 26.1\% compared to the fully fine-tuned OWSM model.
Submission Number: 23
Loading