CATNet: Cross-modal fusion for audio-visual speech recognition

Published: 01 Jan 2024, Last Modified: 21 Jul 2025Pattern Recognit. Lett. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•Proposing a novel cross-modal audio–visual speech recognition network, named CATNet.•Devising a cross-modal bidirectional fusion model.•Devising an audio–visual dual-modal speech recognition network.•CATNet is robust against noises and outperforms other benchmarks.
Loading