Efficient audio-visual information fusion using encoding pace synchronization for Audio-Visual Speech Separation
Abstract: Highlights•Propose an Encoding Pace Synchronization Network for AVSS.•Allowing information to be encoded at paces of audio and visual modalities.•Synchronizing encoding paces of audio and visual modalities.•Preserving the distinct characteristics of each modality.
Loading