Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Pre-trained model, speech separation, modularization
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: The effectiveness of the use of general audio pre-trained models to generate rep-
resentations suitable for speech separation has been explored in a previous study
Huang et al. (2022) with the main finding being that they provide minimal benefit
compared to features extracted without the models. The study hypothesised that
since the general audio pre-trained models were trained with clean audio dataset,
they are unable to generalize to noisy and mixed speeches hence not effective in
speech separation. This paper investigates this hypothesis by comparing the per-
formance of pre-trained model trained on contaminated speeches and that trained
on clean ones. We are interested in evaluating whether contamination leads to bet-
ter downstream performance. We also investigate if the type of input used to train
the pre-trained model impacts the quality of embeddings it generates. To sepa-
rate the sources, we propose a fully unsupervised technique of speech separation
based on deep modularization. Our findings establish that by injecting noise and
reverberation in the training dataset, the pre-trained model generate significantly
better embeddings than when clean dataset is used. Further, based on the model
presented here, working in short-time Fourier transform (STFT) results in bet-
ter features than using time-domain features. The proposed deep modularization
speech separation technique can improve SI-SNRi and SDRi by 1.3 and 2.7, re-
spectively, when mixtures contain less than four sources and improves the results
significantly for many source mixtures.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 908
Loading