Adaptive Speech Intelligibility Enhancement for Far-and-Near-end Noise Environments Based on Self-attention StarGANOpen Website

2022 (modified: 10 Nov 2022)MMM (2) 2022Readers: Everyone
Abstract: When exposed to adverse noisy environments, it is difficult for listeners to obtain information even if the device outputs clear speech. Using Lombard effect, previous studies introduced the conversion from normal speech without noise to Lombard speech at the near-end. However, these method ignored the noise at the far-end and were very poorly effective in strong noise at the near-end interference with very low signal-to-noise ratios (SNRs). In this paper, Adaptive Self-Attention StarGAN (AdaSAStarGAN) is proposed to designe an adaptive Speech Style Conversion (SSC) scheme for near-and-far-end ambient noise. The generator of StarGAN combined the self-attention mechanism and AdaIN with convolutional neural networks (CNNs). In addition, the model was trained on corpus recorded in different noise conditions. Subjective and objective evaluation results show that this method has better intelligibility and naturalness in different far-and-near-end noise environments, especially in low SNRs environments. It enables flexible conversion between normal speech and multi-level Lombard speech, thus making speech intelligibility enhancement more widely used in practice.
0 Replies

Loading