Speech separation by cost-sensitive deep learning

Xiao-Lei Zhang

Published: 2017, Last Modified: 15 May 2025APSIPA 2017EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Deep learning based speech separation has demonstrated good performance in adverse environments. Recent study shows that multi-condition training, which trains a model with several noise scenarios, shows good generalization in test. However, treating all noise scenarios with the same training cost is usually not a good choice: A common problem is that, when training data contain a wide range of SNR, the data in low SNR environments suffer from large training loss, which results in a performance drop when test SNRs are low. In this paper, we propose three cost-sensitive deep learning methods to improve the performance of speech separation methods at low SNRs, which are the methods of (i) learning with a cost-sensitive objective, (ii) learning with cost-sensitive oversampling of training data, and (iii) learning with cost-sensitive undersampling of training data. We also propose to aggregate the three methods to a cost- sensitive deep ensemble learning method. Experimental results demonstrate the effectiveness of the proposed methods.