Target speaker lipreading by audio-visual self-distillation pretraining and speaker adaptation

Published: 01 Jan 2025, Last Modified: 16 Apr 2025Expert Syst. Appl. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•Cross-lingual transfer learning enhances lipreading with limited target language data.•Speaker adaptation boosts specific speaker lipreading accuracy.•Model ensembling with lip and face inputs improves lipreading.•Our approach sets a new benchmark on the ChatCLR dataset.
Loading