Semantic Grouping Network for Audio Source SeparationDownload PDF

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Keywords: audio souce separation, audio-visual separation
TL;DR: We propose a novel Semantic Grouping Network, termed as SGN, that can disentangle sound representations and extract high-level semantic info for each source to guide separation.
Abstract: Audio source separation is a typical and challenging problem that aims to separate individual sources from a mixture of audios. Recently, audio-visual separation approaches take advantage of the natural synchronization between the two modalities to boost separation performance. They extracted high-level semantics from visual inputs as the guidance to help disentangle sound representation for individual sources. Can we directly learn to disentangle the individual semantics from the sound itself? The dilemma is that multiple sound sources are mixed together in the original space. To tackle the difficulty, in this paper, we present a novel Semantic Grouping Network, termed as SGN, that can directly disentangle sound representations and extract high-level semantic information for each source from input audio mixture. Specifically, SGN aggregates category-wise source features through learnable class tokens of sounds. Then, the aggregated semantic features can be used as the guidance to separate the corresponding audio sources from the mixture. The proposed audio source separation framework is simple, flexible, and scalable. Comparing to the existing sound separation methods, our new framework can support audio separation from a flexible number of sources and is capable of generalizing to handle sound sources from different domains. We conducted extensive experiments on both music-only and universal sound separation benchmarks: MUSIC and FUSS. The results demonstrate that our SGN significantly outperforms previous audio-only methods and audio-visual models without utilizing additional visual cues.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)
5 Replies

Loading