NAIST Simultaneous Speech Translation System for IWSLT 2025

Haotian Tan, Ruhiyah Widiaputri, Jan Meyer Saragih, Yuka Ko, Katsuhito Sudoh, Satoshi Nakamura, Sakriani Sakti

Published: 2025, Last Modified: 26 May 2026IWSLT 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This paper describes the NAIST submission to the English-to-German, Japanese, Chinese Simultaneous Speech-to-Text track at IWSLT 2025. Last year, our system was based on an end-to-end speech-to-text translation model that combined HuBERT and mBART. This year, the system consists of a Whisper encoder, the DeCo compressive projector, and the Qwen large language model. The simultaneous translation (SimulST) system is implemented by applying a local agreement policy to an offline-trained translation model. For the streaming translation (StreamST) system, we integrate an online version of the SHAS segmenter into our SimulST architecture. Our results demonstrate that adopting LLMs as the backbone architecture for speech translation tasks yields strong translation performance. Additionally, leveraging robust segmentation capability of SHAS for StreamST achieves good quality-latency trade-off when processing unbounded audio streams.
Loading