REINA: Regularized Entropy Information-Based Loss for Efficient Simultaneous Speech Translation

ACL ARR 2025 May Submission1656 Authors

18 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Simultaneous Speech Translation (SimulST) systems stream in audio while simultaneously emitting translated text or speech. Such systems face the significant challenge of balancing translation quality and latency. We introduce a heuristic policy to optimize this tradeoff: wait for more input only if you gain information by doing so. Based on this heuristic, we present Regularized Entropy INformation Adaptation (REINA), a novel loss to train an adaptive policy using an existing non-streaming translation model. We derive REINA from information theory principles and show that REINA helps push the reported pareto frontier of the latency/quality tradeoff over prior works. Utilizing REINA, we train a model with open-source datasets covering English, French, Spanish, and German for both English-to-other language (En$\rightarrow$X) and other language-to-English (X$\rightarrow$En) translation. We achieve state-of-the-art (SOTA) streaming results for models of comparable size, outperforming baseline methods. We also introduce a metric for streaming efficiency, quantitatively showing REINA improves the latency/quality trade-off by as much as 21\% compared to prior approaches, normalized against non-streaming baseline BLEU scores.
Paper Type: Long
Research Area: Speech Recognition, Text-to-Speech and Spoken Language Understanding
Research Area Keywords: speech translation,streaming,simultaneous speech translation, scaling,MT theory,modeling,spoken language translation,automatic speech recognition
Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency, Theory
Languages Studied: english,spanish,german,french
Submission Number: 1656
Loading