Automatic Speech-to-Speech Translation of Educational Videos Using SeamlessM4T and Its Use for Future VR Applications

Lucas Rafael Stefanel Gris; Diogo Fernandes; Frederico Santos de Oliveira

Automatic Speech-to-Speech Translation of Educational Videos Using SeamlessM4T and Its Use for Future VR Applications

Lucas Rafael Stefanel Gris, Diogo Fernandes, Frederico Santos de Oliveira

Published: 01 Jan 2024, Last Modified: 21 May 2025VR Workshops 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Automatic Speech-to-Speech Translation (S2ST) is crucial for VR, providing immersive experiences and global accessibility. For this task, cascade pipelines are often used, but it faces challenges in low-resource languages due to data scarcity, complexity, and maintenance, meanwhile end-to-end models, though promising, are still in early development. This study explores the latest SeamlessM4T model, an end-to-end S2ST architecture showing great potential for VR applications, and discusses its strengths and limitations in the context of educational VR for low-resource languages.

Loading