Directional Source Separation for Robust Speech Recognition on Smart Glasses

Published: 2025, Last Modified: 25 Mar 2026ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Modern smart glasses leverage machine learning to offer real-time transcriptions, considerably enriching human communication experiences. However, such systems frequently encounter challenges related to environmental noises, leading to decreased speech recognition. To improve voice quality, this work investigates directional source separation using the multi-microphone array. We explore multiple beamformers to assist source separation by strengthening the directional properties of speech signals. In addition to relying on predetermined beamformers, we investigate neural beamforming in multi-channel source separation, demonstrating that automatic learning directional characteristics effectively improves separation quality. Furthermore, we investigate the training strategies for ASR when utilizing separated outputs. Our results suggest that jointly training a directional speech separation and ASR model achieves the best overall performance while balancing the wearer and conversation partner’s performance.
Loading