Speech Intelligibility Enhancement By Non-Parallel Speech Style Conversion Using CWT and iMetricGAN Based CycleGANOpen Website

2022 (modified: 10 Nov 2022)MMM (1) 2022Readers: Everyone
Abstract: Speech intelligibility enhancement is a perceptual enhancement technique for clean speech reproduced in noisy environments. Many studies enhance speech intelligibility by speaking style conversion (SSC), which relies solely on the Lombard effect does not work well in strong noise interference. They also model the conversion of fundamental frequency (F0) with a straightforward linear transform and map only a very few dimensions Mel-cepstral coefficients (MCEPs). As F0 and MCEPs are critical aspects of hierarchical intonation, we believe that adequate modeling of these features is essential. In this paper, we make a creative study of continuous wavelet transform (CWT) to decompose F0 into ten temporal scales that describe speech at different time resolutions for effective F0 conversion, and we also express MCEPs with 20 dimensions over baseline 10 dimensions for MCEPs conversion. We utilize an iMetricGAN network to optimize the speech intelligibility metrics in strong noise. Experimental results show that proposed Non-Parallel Speech Style Conversion using CWT and iMetricGAN based CycleGAN (NS-CiC) method outperforms the baselines that significantly increased speech intelligibility in robust noise environments in objective and subjective evaluations.
0 Replies

Loading