Keywords: speech separation, single-channel, source separation, sequence modelling, differencing, signal processing
Abstract: Speech separation is a problem area where a mixture with overlapping speech signals is the input and estimations of the clean speech signals which make up the mixture is the output. In this paper we propose a novel sequence modelling method called relative context and use it for a speech separation architecture called RCSep.
The main advantages of relative context is that it does not require trainable parameters, is very lightweight and highly parallelized. The RCSep model which heavily uses relative context is an extremely efficient source separation model. It has less than 500k trainable parameters, lower memory usage and is significantly faster than all previous source separation methods while still maintaining high separation accuracy.
Furthermore, we also used relative context instead of LSTMs in a current SOTA architecture which simultaneously improved separation accuracy and decreased computation time, memory usage and model size.
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2646
Loading