Audio-visual speech recognition based on regulated transformer and spatio-temporal fusion strategy for driver assistive systems
Abstract: Highlights•A novel transformer-based method for audio–visual speech command recognition.•Novel fusion strategies of audio–visual features and classifier ensemble.•An attention visualization approach for audio–visual feature impact assessment.•A software application of the transformer-based method for driver assistive systems.
Loading