Tailored design of Audio-Visual Speech Recognition models using Branchformers

David Gimeno-Gómez, Carlos D. Martínez-Hinarejos

Published: 2025, Last Modified: 31 Jul 2025Comput. Speech Lang. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•Novel framework to design tailored, unified, and parameter-efficient AVSR systems.•First to harness the flexibility and interpretability of the Branchformer encoder.•Experiments for English and Spanish show our AVSR framework’s effectiveness.•Competitive state-of-the-art performance with nearly 50% fewer model parameters.•Explainable insights into audiovisual speech processing.

External IDs:dblp:journals/csl/GimenoGomezM25