Tailored design of Audio-Visual Speech Recognition models using Branchformers

Published: 01 Jan 2025, Last Modified: 31 Jul 2025Comput. Speech Lang. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•Novel framework to design tailored, unified, and parameter-efficient AVSR systems.•First to harness the flexibility and interpretability of the Branchformer encoder.•Experiments for English and Spanish show our AVSR framework’s effectiveness.•Competitive state-of-the-art performance with nearly 50% fewer model parameters.•Explainable insights into audiovisual speech processing.
Loading